Blog

Company News

Drew Furgiuele‘s PowerShell for DBAs class has been getting rave reviews, including:

“Fantastic class! This is the best online training I’ve ever taken. Very slick setup using the webinar for learning, an AWS VM for labs, and slack for interacting with the class. This course teaches more than just PowerShell for SQL Server – the general training on day 1 will leave you ready to tackle anything with PowerShell. Drew is a great instructor” – Grant Schulte

“I’ve been wanting to add PowerShell to my DBA toolbelt. This class was exactly what I needed. The instructor, Drew, knows his stuff and is well-known in the SQL Server community. If you want to learn PowerShell for SQL Server, start here.” – Kevin Kelso

“I really enjoyed this class! I knew very little PowerShell and was intimidated when looking at scripts I found online. I feel much more comfortable trying my hand at PowerShell after taking this class. Drew was a great instructor, and I thought the material, demos, and labs were all very good. I also appreciate that the class recordings are available – I had to miss part of the class due to some work issues.” – Lois Scarane

I could go on and on – the reviews certainly do, and if you’re interested, you should check ’em out. Drew’s next class starts July 26th, and registration is open now.

So when Ben Miller came to us and suggested doing a Level 2 course, I was hooked. Here’s what he came up with:

PowerShell for DBAs: Level 2

Do you know some PowerShell and want to take your knowledge to the next level? Do you want a consistent way to manage your servers, databases and services without clicking?

In this 2-day class you will learn to:

Run commands on remote servers using PowerShell Remoting
Set trace flags and protocols on remote servers, then restart them to take effect
Use WMI to get information about disk space
Use the SQL Server provider to navigate SQL Server within a path-like structure
Use PowerShell to determine the existence of objects in SQL Server
Learn SMO and use it to manage a SQL Server and Databases. You will learn how to change configuration properties of a server, change Database Options, add space to a database file or log and many other database management tasks.

Prerequisites: attendees should have either taken the PowerShell for DBAs course, or have 6-12 months of hands-on production PowerShell experience.

About the instructor: Ben Miller is a Microsoft SQL Server MVP and Microsoft Certified Master (MCM). He’s been working with SQL Server in the field since 1997, including 7 years at Microsoft. He is passionate about SQL Server automation and integration, and uses PowerShell and SMO regularly.

Like all our classes, it includes Instant Replay so you can stream a recorded version of the class for a year after it finishes. Students have been raving about how helpful that is when they get pulled away from work for an emergency.

Head on over and check it out! Registration is open now for the June 11-12 class.

Training Week: Announcing Edwin’s New Class on Failover Clustered Instances

Last Updated April 11, 2018

Clustering, Company News

This week, we’ve got a bunch of announcements about new training classes. Next up, Edwin Sarmiento: his 3-day Always On Availability Group class has been getting great reviews:

“This class is fantastic. There is no filler, and no needless repetition, so be prepared to pay attention the entire duration. Edwin clearly is very passionate about his craft and does an incredible job of sharing his knowledge. Great balance of theory and application. I went through 4 glitter pens taking notes. A+” – Jordan

“Enjoyed the class. It was a different approach that I found very useful. I was looking for information about how and when you would use AG. The troubleshooting aspect was very good and as someone has already said, it helped me to understand the dynamics between all the working parts.” – Paula Luther

“This session was amazing. I had previously had some experience with availability groups, but Edwin’s knowledge really helped me understand the dynamics between all the working parts. I am more confident to work with availability groups, and I fully understand many situations that I did not before.” – Michael

“The detail and pace were fantastic! Edwin’s energy and knowledge ensured a captive audience as evidenced by the level of engagement in the channel – which I also found extremely useful for asking questions or just looking for clarification. Overall, an A+ training session – I’m ready to build my first multi-site AG solution!” – Chris

Now he’s bringing out another class.

Always On Failover Clustered Instances:
The Senior DBA’s Field Guide

You need to build or manage a SQL Server failover clustered instance (FCI) but you’re not sure where to start. And with Always On Availability Groups as a high availability solution, it becomes even more confusing. Properly sizing the hardware? Who is responsible for what? How does the Windows Server Failover Cluster work with SQL Server?

The SQL Server FCI is not dead. Even though Always On Availability Groups were introduced in SQL Server 2012, customers are still deploying SQL Server FCIs to protect mission-critical databases. This instructor-led training class is specifically designed for senior database administrators responsible for designing, implementing and managing a SQL Server FCI.

In this live 3-day class, attendees will learn:

Fundamentals of Windows Server Failover Clustering (WSFC) – the underlying platform that makes SQL Server FCI possible – from the external dependencies like Active Directory and DNS to quorum, shared storage and cluster configuration
Designing and implementing a Windows Server Failover Cluster to meet both high availability and disaster recovery requirements
Designing and implementing common topologies for SQL Server FCI solutions for a single- or multi-data center deployments
Managing and monitoring SQL Server FCI implementations
Effective troubleshooting of availability issues for both the Windows Server Failover Cluster and the SQL Server FCI

About the instructor: Edwin Sarmiento is a Microsoft Data Platform MVP and Microsoft Certified Master from Ottawa, Canada specializing in high availability, disaster recovery and system infrastructures running on the Microsoft server technology stack. He is very passionate about technology but has interests in music, professional and organizational development, leadership and management matters when not working with databases.

Online class times:

June 18-20, 2018 9AM-5PM Eastern (click to see in your local time zone)
September 11-13, 2018 9AM-5PM Eastern

Head on over and check it out. See you in class!

Performance Tuning in 21 Demos at the PASS Summit 2018

Last Updated October 21, 2018

Brent Ozar Unlimited Team

#SQLPass, Company News

You’re a developer or DBA who wants to tune indexes and queries to make your application go faster. You don’t like learning by watching slides or reading white papers – you want to jump right into the code and see what happens. You’re the kind of person who loves watching YouTube videos that start with, “Hold my beer and watch this.”

In one demo-filled day with Brent Ozar and Erik Darling, you’ll see SQL Server, Azure SQL DB, and Azure Managed Instances make questionable lifestyle choices. The engine will wildly underestimate the work required with a query, pick the wrong indexes, and jump off the roof into the pool while holding the TV – all in an effort to deliver the right query results. We’ll show you how to coach the database engine into better performance and less problems.

Everybody will get their own USB flash drive to take home with the StackOverflow2013 database (just 50GB, so it’s easier to do the demos on smaller laptops), the demo scripts, and PDFs of the slides.

Attendees will also get 1 year of access to:

Our DBA training videos: DBA Fundamentals, The Senior DBA Class, and the DBA Job Interview Questions & Answers Kit
Our performance tuning videos: Diagnosing Slow SQL Servers with Wait Stats, How to Design and Tune Indexes, Statistics, T-SQL Level Up: Level 2, and Query Plan Analysis
Just those above more than pay for the cost of the $499 pre-con, but..let’s throw in a stretch goal: if we sell out again, attendees get a year of SQL ConstantCare® too!

Our pre-con sold out with 360 attendees last year, and I heard from a lot of frustrated folks that couldn’t get in because they didn’t move fast enough. Don’t make that mistake again this year – go claim your spot now.

Check out the list of pre-cons, and then register quick.

[Video] Office Hours 2018/4/4 with Special Guest Pinal Dave

Last Updated April 7, 2018

Videos

This week, Pinal Dave joins Brent, Tara, and Richie to discuss replication latency, partitioning tables, transactional replication, troubleshooting 3-rd party application performance, SQL ConstantCare® update, SQL Server errors, copying data over from a database with read-only access, reducing blocking timeouts, backups to the cloud, and more replication questions!

Here’s the video on YouTube:

Office Hours Webcast - 2018/4/4 with Special Guest Pinal Dave

Watch this video on YouTube

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 4-4-18

How do you monitor replication latency?

Brent Ozar: Our first question comes in from [Tishal], [Tishal] says, “I was preparing a report using data collected from management data warehouse. I was using one of its tables, snapshots, performance counter values, and it has some columns in it formatted as floats.” He says, “How do you interpret the results of these columns for replication latency?” Tara, you’ve done a lot of replication work. When you want to find out how latent replication is, what do you use in order to do it?

Tara Kizer: I usually just open up replication monitor and then I insert a tracer token. Replication monitor will tell you what the replication is. It’s an estimation, but if you insert a tracer token, it will watch it flow from the publisher to the distributor. It will tell you what that latency is and the distributer and subscriber and what that latency is. So tracer token is a really good way. Some people, via code, they insert tracer tokens and then log that to a table and send out emails, but I figure what the script is to run to query it, but you could run an extended events session to see what replication monitor is doing to grab that latency number anyway and then monitor that via scripts instead of using the GUI.

Brent Ozar: Yeah, use your own scripts to inject a token whenever you want.

I run out of disk space when I create a clustered index.

Brent Ozar: Pablo says, “With a million row BI table, I’m trying to create a clustered index and I’m constantly running out of disk space. Will partitioning help?” Pinal, what do you think?

Pinal Dave: That’s a very interesting question. Actually, this reminds me of the time, years ago, when the disks were very, very expensive. I remember, I had experienced this one and I, even though there was no real reason, I had to partition my table. But the partitioning was actually just done just because of disk error. So yes, partitioning would help, but it makes sense that you create the second partition on a different drive. Same drive, you’re going to be walking into the same problem. Different drives – now you need to do a partitioning, partitioning functions as well as you need to go through the center and logic about which is the key and then you create this horizontal partitioning.

So yes, partitioning would help, but if this is not something you’re looking for, you may want to look at how you can get more space on a disk by changing the disk. Maybe a [longer run tact] would be easier, or any other thing like delete some stuff or move some of the database out of that partition. Or do something like that, that would be more meaningful instead of just doing partitioning for the sake of partitioning. And when you are creating the clustered index, the heart of your table – so that’s my opinion. So create partitioning if there is no option, otherwise, just do what works, actually. At the end of the day, your boss should say, you are able to manage your database.

Brent Ozar: That’s what I say to Richie all the time. Richie, you are able to manage your database. I was just, not even an hour ago, I was telling Richie, “Richie, I want to buy a bigger database server.” And he was like, nope.

Richie Rump: It’s not our problem, Brent. “But I want to. I want to try it.”

Brent Ozar: I want to throw money at it.

Is replication a good fit to copy data between servers?

Brent Ozar: J.D. asks, he wants to do – he says, “Is transactional replication the best way to copy data from a vendor app on server one to a different database on server two? It needs to be as close to real-time as possible. How should I move data between those?”

Tara Kizer: What is with all the replication questions?

Brent Ozar: It’s because you’re here, Tara.

Tara Kizer: I mean, best – I don’t know. Are you using Standard Edition, Enterprise Edition – because if I’m using Enterprise Edition, I’m not using transactional replication. I’m going to be using availability groups and, you know – it just depends what you’re trying to do here. All of the technologies have some latency, unless you’re doing synchronous through mirroring in availability groups.

Brent Ozar: He follows up with, he’s, “On Enterprise but it’s 2008R2.” It does say that they might upgrade to 2017 soon.

Tara Kizer: Yeah, so transactional replication is a good tool to copy your data to another server, but you have to be aware that there is going to be latency at times when big transactions run. Make sure your replication carpology is best practices. You’ve got a publisher on one server, a distributor on another server and your subscribers on another server. Don’t put the distributor on the publisher or subscriber.

I mean, you could, if you’ve got the hardware for it, but best practice is to separate these guys. And just know that it’s replicating inert, update and delete commands and updates get converted into a delete and an insert. So it’s at a very high level and so there can be some latency flowing through the pipeline, whereas mirroring and availability groups occur at a much lower level. But yeah, 2008 Enterprise Edition transactional replication is a good feature, it’s just I’ve spent so many hours troubleshooting it and, you know, waiting for a snapshot to run and waiting for initialization to happen. I mean, wow, a lot of hours.

Brent Ozar: If somebody offered you a new production DBA job and they said you can be a DBA but you have to work with replication fulltime, what would you say?

Tara Kizer: I don’t know – for me, taking another job, it depends upon benefits. I start there and a lot of times I don’t care how bad the job is if the benefits are good. But for me, am I going to be on call and have to work at three in the morning all the time – I don’t mind replication so much, you know. It was stable in my last environment and we didn’t have to do too much with it.

Pinal Dave: When it works, yes.

Tara Kizer: When it works, exactly, that’s the key…

Pinal Dave: So one thing I would just make a comment here is that you will find a lot of experts who can configure the replication, but the people who can just really troubleshoot the replication are, I think, there are only 1%. It’s not an easy task and, yes, configuring, click, click, click, go, done, fine, but how do you fix it?

Tara Kizer: And I’ve even opened up a case with Microsoft, many years ago, to – I was getting a weird replication error and I knew that the snapshot and that whole process was going to take several hours and so I was hoping to fix replication rather than go through that process. And the things that they had me do to replication is not something that you find blogs about. So, I had to open this case with them – and I don’t remember what the outcome was, if they fixed it or not because it was just way too many years ago. But it was just crazy, the queries I was running. And they’re like, grab this, grab that, look at this – it’s just like, how are you supposed to know how to do this stuff?

Brent Ozar: Richie, you look like there’s something you want to say in there too.

Richie Rump: I hate the Cubs.

Brent Ozar: Did they lose?

Richie Rump: They, like, haven’t scored a run in like two games, so. But that’s what I wanted to say, so…

Brent Ozar: You can always tell when baseball season starts with Richie. He’s very preoccupied – this is his one. Justin Setliff says, “Richie, don’t worry. It’s early.” It looked like a blue screen of death flag for a second there.

Richie Rump: And that’s what it feels like.

How do you prove other services are causing performance problems?

Brent Ozar: Nicholas asks, “Hi gang, how can you prove that other processes and services like Apache or internet information services on the same box are causing the SQL Server performance problems?”

Tara Kizer: I would never be able to prove it because I wouldn’t allow those on my SQL Server. So I mean, I need a production DBA here and I just – no, you’re not putting that stuff on my box.

Brent Ozar: And why not?

Tara Kizer: Because SQL Server is a memory hog, IS is a memory hog, I mean, they could be on the same box if you’re talking about a very, very small system, I guess. I mean, maybe in a test environment, but in production, I want to go by best practices and try to avoid having issues at three in the morning.

Pinal Dave: Right, and one of the things – yesterday, I was traveling through India and I was in Pune and my customer had an interesting scenario. They thought the reporting service is just fine and they have absolutely no issue and they put it on the same box as SQL Server. And I was like, okay, we just try to debug why they have a lot of I/O and we tried to start debugging with the various T-SQL scripts. And suddenly, the accidentally opened a task manager and we can see from the task manager’s view that reporting service is taking the maximum amount of the memory.

They were like, “I don’t know why, what was going on.” And we can see that it was taking 30% of their box’s memory and we were like, whoa. I mean, we didn’t even have to run any diagnostic script, the task manager was saying it. And as soon as we had just decided to turn it off and kill it and they had a little minor performance improvement; minor. And we believe, after they might have restarted, it would have worked. So once in a while, the layman’s, or not so smart, solution, like task manager, also tells a story. So we should definitely run all the fancier script, but do not ignore the task manager. That’s what I just told my customer yesterday and they thought, “Oh do you tell this to everybody?” I said, “All the time,” though even I said first time yesterday, yes.

Tara Kizer: I too use task manager. Sure, I know how to use other things also, but it’s so easy – right click, task manager, and there it is.

Brent Ozar: And everybody gets it, like, they understand, we’re not hiding something from you, it’s built into the system. Nicholas says about why he put things on multiple servers, he says, “Sometimes you inherit things.” Man, you’ve got to get better relatives so you don’t have such crappy inheritance.

Why are vendors telling me to replicate to the cloud?

Brent Ozar: Daryl says, “I got an email from a vendor yesterday telling me that I should replicate to the cloud. I thought last week that we were saying we should not do replication. I thought that Brent Ozar Unlimited was saying not do replication.” Well generally, if a vendor is telling you to do something, they’re usually also selling you a tool to do it at exactly the same time. So just think about that one.

Tara Kizer: What problem are they trying to solve here? Why are they saying to replicate to the cloud? What is the issue here? Is it disaster recovery? What is it?

Pinal Dave: I’ll just add a point there because I think I know a little bit of context of this because I have seen a similar email and I little bit read through it. I don’t know exactly situation of the Daryl, but I will just say it this way, I think Quest has a product, I think, and that product is trying to talk about hybrid replication, which is not native to the SQL Server and they say you can go to hybrid as well as heterogeneous environment. And at that time, Always On is not a solution, might be you may want to consider replication. So yes, replication is still true, but in replication not in terms of the SQL Server replication, replication in terms of taking your data and creating a replica of it at a different location. So maybe that’s what they are meaning. And the product might be able to solve the problem, as Brent clearly said.

Brent Ozar: And I’m a fan of putting data in the cloud, too. Like, if you want t reporting copy up there – because classically, a lot of our users are out in the cloud or they’re out somewhere else. And if your building’s internet connection goes down and they still want to be able to run their reports, replicating up to the cloud can make sense. You just want to make sure that you’re solving a problem that the business has.

Richie Rump: I just want to point out that Tara’s wearing a Quest sweatshirt.

Tara Kizer: Usually, my old camera would have been up to here. It’s just cold in my house and this is my favorite thing around the house.

Brent Ozar: I’m not going to lie, Quest has – the Quest velour kind of, whatever that thing is, that’s one of my favorite things that I keep over in the office as well. It’s so soft. If you ever get the chance to get, not a vendor t-shirt, because vendor t-shirts aren’t that good, but if a vendor gives you something nice like a hoodie, a sweatshirt, it’s going to be really nice. Same thing with the jackets – they give good stuff out.

Richie Rump: Yeah, I have an Azure hoodie that I got from the PDC where they announced Azure, when nobody could say Azure… I still have that and I wear it all the time.

Brent Ozar: Nicholas points out that when we were at Quest, we did shirts, Kevin Kline is Devine was one of the t-shirts and that was really good.

How can I fix performance in a third party app?

Brent Ozar: Pablo says, “I have this third-party application and it keeps creating lots of tables. And then, when we run reports, it does a bunch of union alls that take forever. What should I do to help this application’s performance?” Everybody has the same look…

Pinal Dave: Yeah, exactly. I just thought – I think they got us at one word, which is called third-party app. As soon as you say it, we just know what it is about. There are thousands of things we can talk about, union all is just the one thing which you can see probably. And there are so many things like indexes, statistics and hints they’re using inside the core. I’m sure there are so many views to support, so you can’t see what is written in the view, which you can just have to open it, encrypted stored procedure comes handy when you don’t want it – so many things, third-party app could have it, but I mean, this is the right place. I can tell you, there are three resources you can watch, I/O, memory and CPU. Those are the three going to be heavily consumed by this third-party app, no matter what you do. So whatever you can do at a database level, application level, without even touching the code, should be your priority when dealing with a third-party app because most of the third-party apps, as soon as you try to cut something, they somehow make sure they, for any other things that are not even related, they blame you for that.

Brent Ozar: True…

Pinal Dave: I mean, like, dropping an index and – yes, okay, can I tell a quick story before I let Brent and the team answer? Yesterday, I advised one organization who are using a product and I said why don’t you drop some of the indexes and just for trying, I said, let’s check all entire your code base and see if anywhere you are using that index hint, otherwise that would break. So they said sure. They went through the code base search. They couldn’t find that index and they decided to drop that index immediately. Then one of the major screen broke. They figured it out – they’re using now not even the index hint that way with the name of the index. They are using index hint with, like, the inter-parenthesis one, two, three, four, five. They are addressing the index not with the hint name, not index name. they are addressing them with the ID of the index, like four. So [crosstalk] you recreate that index, the fourth index would be something else. Think about what would be your performance. Third-party app can do anything.

Brent Ozar: That’s the worst hint I think I could imagine is specifying index hints by number because – say that somebody accidentally drops it, you don’t even know what it was. That’s terrible.

Tara Kizer: I’ve never even seen people do that. I didn’t even know that that was an option.

Pinal Dave: It’s an option and people practice it. And when I ask them and said why they do that, and their answer was beautiful, “Oh we come from a different RDBMS experience.” I was like, well, saying you are from that Oracle or MySQL or Postgres does not make you any smarter.

Brent Ozar: And they were probably bad in that database platform too. It’s not an excuse. It’s like being caught with a pile of drugs and saying, “Oh I’m from France. It’s legal in France too.” It doesn’t make a difference.

How has the reception been for SQL ConstantCare?

Brent Ozar: Let’s see, Greg asks, “How’s the reception been for ConstantCare? I’m on day one and I love what I see so far.” Greg, it was funny to see you sign up too. I think we broke past 300 servers. I know we’re up past 100 users and I think we’re past 300 servers.

Richie Rump: Yep, something like that.

Brent Ozar: Yes, we’re up over 150TB of databases monitored. So it’s been a lot of fun. It’s been keeping me busier than a one-armed paper hanger trying to give advice to these people for servers, so it’s been pretty funny.

Pinal Dave: I have one question for you because it’s a V1. I have been reading myself – and this question is from me actually and I’m just inserting because I get the opportunity. Who writes all these things? I mean, it doesn’t look automated.

Tara Kizer: It was me.

Brent Ozar: We started with sp_Blitz. So sp_Blitz gives you all these warnings about your stuff. So I told Richie, here’s what we want to build and we’re going to take the rules from sp_Blitz, and that’s like all the input he had form me and he, like, boom, took off.

Pinal Dave: the VMware suggestion, it blew my mind. I didn’t know that. just reading that email about VMware, I learned something. I was like, oh, I think a couple of my customers in the past, I never talk about this. They might be facing this. They put complete trust in my and I didn’t know that part even. I just learned a second ago.

Brent Ozar: To me, I think that’s one of the markets too that we’re aiming for is consultants and independent freelancers because it’s so nice to have a second opinion. Just tell me what’s going on else that I might have missed. And so it’s really good to get this list of, you forgot about this one rare issue, you know, this one rare poison wait or whatever. So it’s been a lot of fun with that.

Richie Rump: [crosstalk] a blast doing all this stuff and I know he is because he gets quiet and I’m like, oh he’s just having too much fun over there.

Brent Ozar: And Richie can spell the customer emails that I send out too, like seeing out what the recommendations are. So he sees where I’m tweaking things from the system and how I go into details and it’s fun.

Pinal Dave: This is amazing because one other thing I’ll tell you – because one place you have mentioned about CUs, it was so nice to talk about CUs because I knew that one and I was so happy; a lot of people do not know. So if you install SQL Server 2016 SP1 right away, suddenly you will see your locks going crazy. Your locks are so many that you would be like, what happened? I didn’t change my code. Everything was fine in 2014. And as soon as you start updating the latest CU, suddenly all the locks will disappear and there was resource semaphore. And then, we just had to create one index and resource semaphore one, but if somebody would have not updated the CU, which the suggestion was, they don’t have ConstantCare. And I was like, I can clearly see if this guy has not done CU and four of the servers were on a default, he would be facing the locking scenario, crazy, thinking they have a bad code and that’s not the case. It’s just CU update you just have to do and your life is all good after that.

Brent Ozar: Yeah, and if you apply a bad CU, if you apply a CU that breaks things, I want to be able to tell people very quickly.

Pinal Dave: For sure, yes.

I’ve been getting database mail errors about spawning processes…

Brent Ozar: Let’s see, Mike asks, “I’ve been getting errors…” Oh my goodness, Mike says, “I’ve been seeing SQL Server fail to spawn a thread to process a new login in the middle of the night. Should I add logging schedulers?” he wants to start logging different DMVs, schedulers, workers, tasks, every five minutes. “What should I do? It’s a brand new server with only one database. It’s got two cores, 8GBs of RAM and SQL Server Enterprise Edition.”

Tara Kizer: Why is this server so small when you’re spending so much money on Enterprise Edition? That’s what I want to know right off the bat. 8GB of RAM? My new desktop has a lot more.

Brent Ozar: And the two cores thing is tricky too because the minimum core licensing for a VM is four cores. Now, if you’re licensing by the host, you can license the whole host and then start small, but two cores of Enterprise Edition is $14,000. I mean, it’s a big deal.

Tara Kizer: I’m not even familiar with that message. Is that a specific SQL Server message? Because that doesn’t – I’ve looked at a lot of error messages in the past 20 years or so; that’s not one of them I’m familiar with. I’m wondering if this is something, an application error, instead.

Brent Ozar: Ooh, Mike says, “It came from DBMail.” Wait, so now this really makes me suspicious of the number of messages you’re processing in terms of DBMail. That seems sketchy. We should probably ask around the question too. So my personal thought on DBMAIL is I don’t want it to email customers directly. It’s one of those things that I would use for administration type stuff, but I wouldn’t want to try to make an app server out of it. It’s okay, it just doesn’t scale that well. Mike says, “It sends to DBAs.” Yeah, that’s not so bad. So I could see that. I don’t know that I would start by logging DMVs though. I would start by looking at wait stats – look at wait stats overall and see what you’re looking at then.

Pinal Dave: Because there are only two CPUs, so when he says CPUs, I assume there are two processors, right, then he might be running out of the threads during the night due to maybe some other operations. It might be conflicting with your other maintenance jobs. Just possible, thinking, because you said midnight.

Brent Ozar: Yeah, so it was probably got 50 jobs all starting off at exactly midnight. I used to do that as a DBA. It’s the middle of the night – just set everything to midnight. Then you could see all the lights in the data center get dim right at exactly midnight…

Richie Rump: It’s like four o’clock for ConstantCare.

Pinal Dave: right, when somebody said midnight, that’s the only thing that comes to my mind because I don’t know why, when I was a beginner, 12PM was like the most holiest time to do pretty much everything; fire off the backup, fire up the index maintenance and let them run in a catch-22 situation.

Brent Ozar: the server is totally idle from like 10 PM to midnight. There’s nothing happening, and then it’s like [crosstalk] of football players go running into the thing at the same time.

Pinal Dave: Right exactly at midnight, and yes, and [they never finish]…

Tara Kizer: And just to give an extra hint, look for thread pool waits. And so what I would do, for cheap monitoring, just log WhoIsActive to a table every 30 seconds, every minute. You could do it all day long, like I’ve done in the past, but if you just want to monitor for this specific issue, log that and then look for thread pool waits. You should see, in the info column, I believe it is. Look for blocking, but it might not be blocking. It might just be running out of worker threads because you only have two CPUs.

Brent Ozar: I should also point out that if you search on Bing for sp_WhoIsActive log to table, you get some really pretty adds on the side there about side tables made of logs. Tara has a blog post on logging activity with sp_WhoIsActive there that’s really good.

I need to sync data from a read-only database…

Brent Ozar: Oh, let’s see. Next up, Steven says, “I only have read access to a database and I need to take the data out of that and sync it to another database that I have full control over. What way would you use to get that done?”

Tara Kizer: You’re very limited. SSIS probably – I mean, you only have read access, you can’t add triggers, you can’t do anything. So you’re going to need some kind of crosses.

Pinal Dave: Right, or just backup and restore and remove the read restriction. Just thinking loudly, even if that is possible.

Brent Ozar: And Richie used to do a lot of this kind of thing…

Richie Rump: Yeah, when he said sync, that’s the key word for me. It’s like, what does that mean? Does that mean you need to have – hey, I’ve got one source of truth here and I’ve got the other in the other database and I’ve got to merge them together. That is a lot more difficult than just saying, I have an ID here and I just need to see if it exists or not in the other one. So SSIS, I guess, if it’s the easy one. If you need to merge them together, you’re looking at a lot of work there. That’s not simple.

Brent Ozar: At least he said he only has read access to one of them, so it’s probably not that, thank goodness because that is terrible.

Richie Rump: Yeah, I mean, it said sync, you know. When I see sync, I don’t think – I’ve spoken to too many executives, so that’s my problem.

Brent Ozar: Well, and they always end up saying the same thing, “We just need to get this one row back to the other side. It’s just one row. How hard can it be?”

How can I reduce blocking between an insert and an update?

Brent Ozar: Naveen asks, “We have two processors running on SQL Server. One’s doing an update, one’s doing an insert. They both kick off at the same time and they’re getting blocked by each other. What should I do to reduce blocking timeouts?”

Tara Kizer: Who is the asker on this one? Read it again.

Brent Ozar: Naveen. And so for behind the scenes stuff for you all, we can all see the same panel of questions but so many of you pick out the questions, the reason why I read your name out, it’s not because I care about you, I don’t care about you at all, but I want the rest of my co-presenters to see which question we’re dealing with.

Richie Rump: But we’d love to mentor you…

Pinal Dave: The update statement is timing out. That means it’s not definite. I mean, it’s just locked on that select or when both of them started a process. One thing, if they’re deadlocking or just locking, and it says after 30 seconds of blocking. So is it an application thing? Maybe there is a timeout. If he just waits a little bit longer, it might be finished. I’m just thinking loudly. Like sometimes, you know, we give very short timeouts and things have to – and I tell everybody that locking is alright, it’s just all about waiting game. Once somebody finishes, the second person is going to get a turn, but deadlocking is bad, so that’s the reason that timing out it immediately. So when I see this one, I might just think that maybe you could change the timing, if possible, if you can do it. If that is not possible, see if one of the processes finishes first so you don’t have to wait for a 30-second timeout to kick in. and if, no matter what you do, if 30-second timeout is still kicking in, well, one of the processes is going to have to be finished one time or another time, then I think another one has to wait. So increase the timeout, and if all of them is not an option, then you just have to look at your company and see your database and see what blockers are there and take from WhoIsActive you can take it, or you can take it sp_Blitz – start seeing all the blockers and start removing one at a time.

Tara Kizer: I would just wonder, because the update statement is the one that’s timing out, so the insert one is the one that’s causing the blocking. Is this a single row insert? Investigate that process. Is it a very large query that’s getting inserted into a table? Investigate the insert and see if that can happen any faster or any indexes that can be added. If it’s just a singleton insert, you know, I doubt that that’s the culprit here.

Brent Ozar: Oh my god, he follows up with, “The insert’s doing 20,000 rows.”

Tara Kizer: Okay, well, I mean, you know, break that up if it can’t complete in 30 seconds. So look at your processes and do things differently. Is it a bulk insert? A bulk insert can be pretty fast. 20,000 rows isn’t any big deal for a bulk insert.

Pinal Dave: What Tara said is very, very true. One of the demonstrations is my favorite demonstrations, for pretty much all my presentations I open with it, where I just show them that how a lot of indexes can slow down your inserts and a lot of people even think it’s linear. So if you have one index, your system is taking three seconds to update. If you create two indexes, a lot of people say it will take six indexes to update. That’s not the case.

When you create one additional index, amount of the administrative tasks SQL Server has to do around that index update is way more than just doubling the time than your original update. So it’s quite possible, a lot of people say this is very generic advice, they say just remove all your indexes and recreate back. That is something your ETL process has to do, but again, that’s just a tradeoff. So you save time now, and when you rebuild all the indexes, you are going to lose the same time. So end result, it’s going to be the same thing, it’s just what makes you happy.

How can I capture who does what?

Brent Ozar: Let’s see, Emily says she’s been using a trigger to find out which login is logging in, like what users are logging in, and what database they access. But when they log in, people usually seem to log into master. “Is there a way that I can capture what queries people do in which databases?”

Tara Kizer: What are you trying to solve here? Because you’re going to be adding some performance overhead if you want to figure out what people are doing. I mean, you could possibly set the default database to whatever database they should be in. That way, they don’t go to master first because they’re going to end up in the right database with their queries, since the applications are working. But if you switch the default database to whatever database they should be in, you might be able to have easier success with this. I just wonder what you’re trying to solve here. I mean, you’re obviously auditing and this is going to eventually cause performance issues.

Brent Ozar: It’s hard when people do cross-database queries too, you know. They’ll join across five different databases.

Richie Rump: Because they can.

Brent Ozar: And union all, because they’re third-party vendors.

How can I get my data offsite fast?

Brent Ozar: Daryl says, “I want to get my data offsite. Y’all have mentioned backups to the cloud. What options do I have to get my data off to the cloud that would get me protected quickly?” And Tara wrote a white paper about this.

Tara Kizer: I like referring to that whitepaper with clients that don’t have any disaster recovery in place and they’re not willing or not able to currently spend any kind of money on a disaster recover solution. So I’m just like, well, how about just this cheap solution where you just send your backups to a storage bucket somewhere in the cloud, and then you at least have that. I mean, obviously, you’re going to need to send your source code. You can’t just have a database without an application, you know. So at least get that stuff out there. That way, if the primary site ever goes down, you could spin up servers in the cloud and then restore everything after that. It’s not going to be easy. It’s certainly not going to be easy.

From the database perspective, it’s fairly easy. The whitepaper covers all the stuff that needs to happen, but getting your whole environment up and running up there is going to be very, very painful, but at least you’re not toast completely.

Brent Ozar: It’s practically free too. It’s really cheap to get the log shipping going up there.

Should I drop my indexed views when changing replication?

Brent Ozar: And then the last one we’ll do, J.D. says, “I’m sorry for another replication question. We have indexed views on our replicated database, but because they require schema binding, we’re using pre and post replication scripts to drop and create the views and indexes. Is this a bad idea?

Tara Kizer: I don’t necessarily know if it’s a bad idea or not. It sounds like it’s working for you, so how can it really be a good idea if it’s working for you? I have not used indexed views in conjunction with replication, but I have had to use post scripts for replication where our source schema and the publisher was going to be different than the subscriber. So in the replication stored procedures, we changed those inserts and updates – it was just the insert and update stored procedures to modify the schema, and that worked fine for us. So if it’s working for you, is it a bad idea? I’ve never heard that it’s a bad idea, but I don’t know how many people are using replication in conjunction with indexed views.

Brent Ozar: Well thanks a lot everybody for hanging out with us this week. Thanks, Pinal for joining us and look forward to seeing people in your training class coming up in June as well.

Pinal Dave: Oh yes, I’m pretty excited. So yes, as we start to discuss, there are a few signups and I’m looking for a few more people to join in with us because I can promise it’s going to be fun and a lot of interesting demonstrations. And most important thing, what I’m looking at is the module three where cheating is allowed. I want to see how people cheat with each other and come up with the wrong answer.

So that’s going to be fun. If you come up with the right answer, bravo, but most probably, I’m going to be in there, so the person who gets the most number of the wrong answer, you are going to be the winner that day.

Brent Ozar: Nice. We’ll see everybody next week at Office Hours. Adios, everybody.

How To Break SQL Server’s XML Data Collection

Last Updated April 5, 2018

Humor

Oh, XML

When we first met, it was about 2011. I had to load files full of you into a table. It wasn’t so bad.

You were well-formed, and didn’t change from file to file. Even using SQL Server 2005, I could take you.

Later on, we got really close when I awkwardly started working on sp_BlitzCache, like a divorce-dad trying to figure out what you like so our weekend visits wouldn’t be so painful.

We learned to get along. Heck, we had some good times.

But the plan cache isn’t the only place that you get used. No, Microsoft uses you in all sorts of crazy, mixed-up places.

Extended Events
Deadlocks
Service Broker (I know, I know…)
SSIS… things. Probably.

There’s likely a bunch more, but hey. There’s only so much sunshine.

It’s Time We Talk

No, I’m not leaving you for JSON — JSON isn’t looking for anything serious. Heck, we’re not even sure if JSON will get deprecated for whatever is popular on Hacker News in a couple weeks.

It’s just that, when it comes to storing information about deadlocks, you haven’t been handling yourself so well.

I think you’re on Predefined Entities. I think you have a problem.

You see, if someone creates a table that has a funny character in it — “, &, >, or < — you don’t sanitize all your inputs. Then you throw an error when we try to parse you.

The worst part is that you get it right sometimes.

But other times… Other times!

Even execution plan XML gets this right.

We Got Some Breaking Up to Do

If you don’t change your ways, (hint — that’s a Feedback Item, you should go vote for it if you want me to like you), you’re just going to be another known limitation in a long line of scripts.

Thanks for reading!

Why sp_BlitzLock Can’t Show Complete Columnstore Deadlock Information

Last Updated June 25, 2021

Deadlocks

People seem to care about deadlocks

That’s why I wrote sp_BlitzLock. Why not sp_BlitzDeadlock?

Well, I had this song stuck in my head. But enough about that.

While poking around with things recently, I created a pretty typical deadlock on a table, but this one had a clustered column store index on it.

Of course, that makes things different.

So we go digging

The deadlocks I generated were on a single object. Just a couple inserts and deletes in two separate transactions. Nothing special.

This is the information that I pull out with sp_BlitzLock

But what’s missing from the column store deadlock information is really weird. Namely, everything.

One may be tempted to try to use the associated object id to get the object name, and one would be left wanting.

SELECT OBJECT_NAME(72057594059948032)

Msg 8115, Level 16, State 2, Line 12
Arithmetic overflow error converting expression to data type int.

SELECT OBJECT_NAME(72057594059948032)

Msg 8115, Level 16, State 2, Line 12

Arithmetic overflow error converting expression to data type int.

It’s a bug in Microsoft SQL Server.

I am currently unable to give you as much information about column store deadlocks as I’d like to.

Unless Microsoft responds to my ~~Connect~~ Feedback Item.

Brent says: this is such a great example of why niche features run into interoperability issues. They work just fine by themselves, but they’re not tested in combination with other things – even when those things might be mainstream, like deadlock graphs. I bet this bug gets fixed quickly though – with columnstore indexes gaining popularity, more folks will be running into this issue fast.

Update June 2021: Microsoft still hasn’t responded to the bug report.

Building SQL ConstantCare®: 10% of you have Priority Boost on.

Last Updated April 10, 2018

One of my favorite things about my job is being able to take the pulse of how real companies are managing their databases. I don’t wanna be locked in an ivory tower, preaching down to you, dear reader, about how you should be doing five million things a day – when in reality, you’re struggling to get an hour of real work done because you’ve got so many meetings.

But sometimes I wonder – am I out of touch? When I was a DBA, I remember struggling with backups and corruption checking – has that gotten easier? Have today’s DBAs started using more automation and tools to protect their estate? Is Transparent Data Encryption catching on? Did DBAs start using Extended Events for monitoring when I wasn’t looking?

And it’s important because I wanna build the right training material and scripts for our customers. I see a problem trending, I want to be able to give people the right information to fix the problem, fast.

When we launched SQL ConstantCare®, I was excited to see what the data would reveal. 86 users have opted into public data sharing for 285 servers hosting 11,521 databases.

Here’s some of the interesting things we’ve learned so far.

SQL Server adoption is still slow.

It’s spring 2018, but SQL Server 2017 still has less adoption than SQL Server 2008, let alone 2008R2. I’m really curious to see how this progresses as we move towards the end of support for 2008 and 2008R2 next year.

In terms of edition, I went in with no expectations – I really have no idea what our readership looks like, and it’s interesting to see numbers:

Over 50% of us had basic backup issues.

58% of all us had at least one database (37% of databases overall) that hadn’t had a full backup in the last week. Now this is a little tricky: in the initial round of collection, I noticed a trend that a lot of people would add a development server first, then look at the email advice to decide whether they wanted to add more servers.

However, I also noticed a trend amongst the replies – paraphrasing:

“You know, I’d forgotten about those databases. We restored that a while ago to get some data out of it, and then I guess I forgot to delete it. I’ll go delete those now.”

Similarly, 38% of all servers had databases in full recovery model, but weren’t doing transaction log backups on them.

Between dropping databases that shouldn’t be around (shout out to the multiple folks that had AdventureWorks in production), plus suddenly clearing away unnecessarily giant log files, I can see how the database size tends to drop quickly on servers within the first few days of setting up SQL ConstantCare. (We’ll do an ROI study on that over time.)

We have a mixed record on corruption checking.

93% of all databases had a CHECKDB in the last 2 weeks! That’s awesome!

However, things were a lot worse when it came to enabling checksums for page verification. 5% of databases didn’t have checksums turned on, which sounds small, but it was spread across 54% of the users. Look at the person sitting next to you: either this issue affects you, or it affects them. One of you needs to buckle up.

We’re still not patching.

16% of customers are running a completely unsupported build of 2008 or newer (meaning they haven’t applied a service pack in a few years)
10% of customers are running builds with known corruption or security escalation bugs
Hardly anyone is patched for Meltdown/Spectre

As a teacher and consultant, I gotta think hard about that. I don’t have easy answers. It’s not like I can just build a presentation and magically get the business to agree to outage windows.

Long term, I’m thinking of it as a data problem: can I tie peoples’ server problems to a specific CU that has a fix for their issues? That won’t be on the horizon for the product in 2018, but it’s an interesting long term challenge.

37% of us are experiencing poison waits.

When RESOURCE_SEMAPHORE, RESOURCE_SEMAPHORE_QUERY_COMPILE, and THREADPOOL strike, it can feel like your SQL Server service is completely locked up – even though you can remote desktop into the base OS and it responds just fine.

When I talk to training classes about that, I’ve been saying that most of you will be able to go your entire career without having to troubleshoot those issues. Turns out I’m completely wrong, and I need to start talking about these more often, like blogging about how to recognize the symptoms even if you’re not monitoring wait stats.

On a related note, 16% of us have had memory dumps recently. My old advice was to install the SSMS memory dump upload utility, but Microsoft shut that down – likely due to security issues around GDPR, since memory dumps can include PII. Before I write new advice there, I’m going to dig deeper into the data – like if the dumps are correlated to specific builds/versions – to improve my advice.

We still have priority boost turned on.

Thanks, SSMS

It’s hard for me to believe that Microsoft still exposes this as an option in SSMS 17.6. Users should be protected from themselves and from really bad Internet advice – this should be deprecated, burned, and pushed to the bottom of the ocean.

Here’s the real kicker, though: ten percent of us have a server with Priority Boost on.

Another way to think of it: when you’re in a user group session with 40 other people, 4 of them have Priority Boost on. Or maybe you, and 3 other people.

The exciting thing is that we can track what happens to wait stats as people turn that feature off, and then prove if it made things better or worse. (Over 20% of us have either auto-close or auto-shrink enabled on databases, too, but thankfully it’s confined to about 11% of our servers.)

I’m barely scratching the surface here of what we’re learning. As the data grows, I’m really looking forward to showing users comparison data of how they rank related to other shops, how their database health stacks up, and the easy stuff they can do to get better.

Read more of my SQL ConstantCare posts, or sign up now.

The Many Mysteries of Merge Joins

Last Updated April 8, 2018

Execution Plans, Indexing

Not A Single Picture Of A Zipper

Image humbly borrowed from https://70srichard.wordpress.com/2014/12/17/beverly-hills-cop/

There are some interesting things about Merge Joins, and Merge Join plans that I figured I’d blog about.

Merge joins have at least one interesting attribute, and may add some weird stuff to your query plans.

It’s not that I think they’re bad, but they can be tricky.

Lots of people see a Merge Join and are somewhere between grateful (that it’s not a Hash Join) and curious (as to why it’s not Nested Loops).

Oh, you exotic Merge Join.

E pluribus unum

In a “good” Merge Join, the join operator in the query plan will have the Many to Many: False attribute.

The optimizer knows this because the Primary Key on Id (though a unique index or constraint offers similar assurances) is distinct for each value in the Users table.

Having one unique input give you a one to many Merge Join.

The statistics TIME and IO profile for this query is about like so:

Table 'Posts'. Scan count 1, logical reads 2282, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 329 ms,  elapsed time = 333 ms.

Table 'Posts'. Scan count 1, logical reads 2282, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Users'. Scan count 1, logical reads 417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 329 ms, elapsed time = 333 ms.

Not too shabby for one meeeeeeeeeeeeeeeeeeeellion rows.

E pluribus pluribus

In a “bad” Merge Join, that attribute will be True.

Why does this happen, and why is it bad?

It happens most commonly when there are, or may be duplicates on both sides of the results. They can also happen when the outer input has duplicates, but quite often the optimizer will rewrite the JOIN order to put it on the inside to avoid having to use a worktable, etc. Thanks to Adam and Kalen for nudging me in the comments to clarify this part.

Internally, the Merge Join will spin up a work table (similar to how a Hash Join operates), and work out the duplicate situation.

The stats TIME and IO profile of this plan looks like so:

Table 'Worktable'. Scan count 1, logical reads 28836, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Votes'. Scan count 1, logical reads 5, physical reads 1, read-ahead reads 464, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 343 ms,  elapsed time = 366 ms.

Table 'Worktable'. Scan count 1, logical reads 28836, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Posts'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Votes'. Scan count 1, logical reads 5, physical reads 1, read-ahead reads 464, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 343 ms, elapsed time = 366 ms.

If we were to, say, choose that serial Merge Join plan in the compilation of a stored procedure that became the victim of parameter sniffing, we could run into trouble.

Table 'Worktable'. Scan count 505697, logical reads 67579461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 62710, physical reads 0, read-ahead reads 55333, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Votes'. Scan count 1, logical reads 15112, physical reads 0, read-ahead reads 14640, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 459079 ms,  elapsed time = 459432 ms.

Table 'Worktable'. Scan count 505697, logical reads 67579461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Posts'. Scan count 1, logical reads 62710, physical reads 0, read-ahead reads 55333, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Votes'. Scan count 1, logical reads 15112, physical reads 0, read-ahead reads 14640, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 459079 ms, elapsed time = 459432 ms.

Yes, that is seven minutes and forty seconds. A little over half of one metric cigarette break.

Head to head, the large merge (many to many) is costed higher than the smaller merge (one to many). But you won’t see that in a parameter sniffing situation.

You’ll only see the lower costed Merge Join version of the plan.

The optimizer will sometimes to try protect itself from such hi jinks.

Aggregation Ruling The Nation

In some cases, the optimizer may inject an aggregation into one side of the join ahead of time to distinctify the data. It doesn’t need to do both — we only need one distinct input for the many to many attribute to be false.

I haven’t seen a situation where both inputs get aggregated merely to support the Merge Join, but it might happen if you ask for other aggregations on the join column.

It can use a Stream Aggregate, which would generally make more sense, since both the Stream Aggregate and the Merge join require sorted data.

Under less reasonable circumstances, you may get a Hash Match Aggregate. This plan has the additional misfortune of needing to re-order the data for the Merge Join. Teehee.

If you see this, something has truly gone wrong in your life. This query was heavily under the hint-fluence.

A Sort Is A Sort

Much more common is seeing a Sort injected into a plan to support one or more downstream operators that require sorted input (Merge Joins, Stream Aggregates, Windowing Functions).

For instance, a query like this:

WITH Votes
    AS (   SELECT   v.UserId,
                    ROW_NUMBER() OVER ( PARTITION BY v.UserId
                                        ORDER BY v.Id ) AS rn
           FROM     dbo.Votes AS v )
SELECT  COUNT(*) AS records
FROM    dbo.Posts AS p
JOIN    Votes AS v
    ON p.OwnerUserId = v.UserId
WHERE   v.rn = 1;

WITH Votes

AS ( SELECT v.UserId,

ROW_NUMBER() OVER ( PARTITION BY v.UserId

ORDER BY v.Id ) AS rn

FROM dbo.Votes AS v )

SELECT COUNT(*) AS records

FROM dbo.Posts AS p

JOIN Votes AS v

ON p.OwnerUserId = v.UserId

WHERE v.rn = 1;

May give you a plan like this:

In this case, the Sort happens early on to support the Window Function. It also aids the Stream Aggregate, but whatever. Once data is sorted, the optimizer tends to not un-sort it.

The Sort in the next example will be needed with no index on, or where the join column is not the leading column in the index (there’s an If here, which we’ll get to).

If this is my index on the Votes table, data is sorted by PostId, and then UserId

CREATE INDEX ix_votes_uid ON dbo.Votes (PostId, UserId);

1	CREATE INDEX ix_votes_uid ON dbo.Votes (PostId, UserId);

When my query is just a simple join, like this:

FROM    dbo.Votes AS v
JOIN    dbo.Posts AS p
    ON v.UserId = p.OwnerUserId
ORDER BY p.OwnerUserId;

FROM dbo.Votes AS v

JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

ORDER BY p.OwnerUserId;

My query plan will look like this, with a big honkin’ Sort in it:

On the other hand, if my query looks like this:

FROM    dbo.Votes AS v
JOIN    dbo.Posts AS p
    ON v.UserId = p.OwnerUserId
WHERE v.PostId = 11227809
ORDER BY p.OwnerUserId

FROM dbo.Votes AS v

JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

WHERE v.PostId = 11227809

ORDER BY p.OwnerUserId

My WHERE clause filters the leading column to a single PostId (one Post can be voted on by many Users), the UserId column will already be sorted for that single PostId value.

We won’t need to physically sort data coming out of that.

Out In The Street, They Call It Merge Join

I hope you learned some stuff that you can use when troubleshooting, or trying to understand a Merge Join plan.

This is one of many places that the optimizer may inject a Sort into a plan that you didn’t ask for.

Thanks for reading!

ZIPPER FREE!

Building SQL ConstantCare®: The Minimum Viable Product (MVP)

Last Updated April 10, 2018

When you wanna build something, how early do you let the public in? Where do you draw the lines with the features you absolutely have to have – versus the stuff you just want? Code is never really done.

The lines are so blurry these days with labels like Alpha, Beta, Private Preview, Public Preview, Early Access, Limited Availability, Regional Availability, General Availability, etc. Companies wanna get something in your hands as quickly as possible so they can start learning from what you like and what you use. What’s the first thing you actually ship?

MVP: Minimum Viable Product

In Eric Ries’ excellent book The Lean Startup, he talks about how your company should rapidly iterate along:

Release something as quickly as possible
Measure how customers use it
Learn lessons from those measurements, and then
Go back to step 1

The first time you hit step #1, that’s called the Minimum Viable Product (MVP). Your MVP doesn’t even have to be an app or online service – it could be a manual process, or it could even be just a signup form for a service that doesn’t exist yet. In a sense, you could think of sp_Blitz as the MVP: a script that people could run and get advice about their server.

If you were going to build a system to give people advice about their servers, here’s a few ways you could do it:

Totally Manual Process: you build a list of queries, they copy/paste the queries into SSMS, copy/paste the results into Excel, email them to you, you analyze the data manually, and you manually email them a reply.
Automatic Collection, Manual Analysis: you build an application that runs a bunch of queries and dumps the data into files (say Excel). The user emails you the files, and then you analyze them manually, and tell them what to do.
Automatic Collection, Automatic Analysis: you build an app that runs queries, sends the data to you, and then robots analyze the data and send recommendations to the end user.

Obviously, #3 is a hell of a lot harder than #1, and it ain’t minimum.

You’re probably looking at what SQL ConstantCare® does and thinking, “Wait, he screwed up – he jumped straight to #3.” Well, we’d already done #1 early in our consulting practice, and #2 was our SQL Critical Care® consulting service. (Richie had built a data collection app to make our process faster.)

There’s a lot of gray area in #3, though. We made a lot of brutal decisions about what we would do in order to get the MVP out the door.

What we skipped

We wanted to collect data, put it into a database, and send you emails with advice – but everything about the process was up for debate. If we could put something in your hands faster by making some tough decisions, then we made ’em.

“Mute” links – I wanna make it as easy as possible for you to permanently mute servers, databases, or specific alerts that you don’t care about. Soon, the emails will have mute links, but for now, we’re having folks just hit reply and tell us what they want muted, and we mute ’em on the back end.
Self-updating app – really wanted this for v1 because I figured we’d be iterating fast over the collector & queries, but no go. The components are there, but you have to run ConstantCare.exe yourself manually if you want to get the update. Has to do with permissions gotchas with Squirrel, the updating tool we used.
Windows Service (rather than scheduled task) – an always-running service would let this thing work better in a data center environment, but I’m also hesitant to deal with the support gotchas involved with an always-on self-updating service.
Team hierarchy – later, I’d love to designate different people for different groups of servers, or different levels of alerts (like production DBA vs developers.) For now, if you want different teams to get emails for different servers, you’ll need to install different instances of the collector.
Interactive web site with your data & recommendations – didn’t wanna hassle with logins for now.
Shareable anonymized reports – I’d love to give you the ability to pass your DMV data on to your software vendor or consultant and say, “Here’s what we’re up against – you take a look and tell me what you think.”
AG/DR-aware checks – if you have an AG, I’d like to be able to identify backup history across all the nodes and tell you if you have gaps in coverage. (As long as we’re getting backup data across all replicas, this should be doable – just takes more query work.)
Troubleshooting tools – sure, it’d be nice to have a dashboard showing where all the incoming files are at in processing, but it doesn’t make sense to build something like that when our processes are changing so fast during the MVP.
Automated wait stats analysis – for example, “Your server is waiting on storage, but it’s not because the storage is bad – you just need to change max memory because it’s set incorrectly. Change it to ___, here’s how, and here’s how safe (or unsafe) it is.” Right now, I’m doing this manually for customers, building a manual process of what I’m thinking as I do it.

What we shipped

Users install a desktop app that sends us diagnostic data once a day. Here’s a video showing how you install it – warning, it ain’t pretty:

SQL ConstantCare Installation

Watch this video on YouTube

There’s no GUI, there’s no wizard to add servers for you, and there’s no input validation if you screw up your email address or server names. There’s a lot of places where things can go wrong. I’d love to have an entirely graphical setup process that sweeps the network, suggests the SQL Servers that you would want to monitor, and guesses the right time zones for each server.

I would also like a pony. (Not really. Ponies smell bad. Except this pony.)

But you get what I mean – you go to ~~war~~ market with the ~~army~~ installer you have, not the ~~army~~ installer you want. Besides, this is a one-time experience for users – the more important experience was getting the emails that were valuable and actionable.

The emails – that’s where the value comes in. Right now, Lambda functions analyze the data and send us emails that look like this:

Erik needs to do a better job of backups

They’re the same emails that you’ll eventually get directly – but for now, we’re watching them ourselves, making sure the functions are returning the right recommendations, and then manually sending guided recommendations directly to the customers.

Then for some customers with special situations, or where the automated emails aren’t quite the right fit, we send manual emails like this (sanitized customer email sent to myself):

Is that a lot of work? Sure it is – early adopters are basically getting a screamingly cheap deal on our personalized attention via email. If you’re going to automate something big, though, you’ve gotta start by making sure you can do it manually first.

How the MVP is scaling so far

The line is the number of servers we’re analyzing (from customers willing to share their numbers – more on that in the upcoming GDPR post), and the bars are the terabytes of data on those servers. The jump yesterday was the first day of the marketing launch:

Yes, the terabytes of data have gone down more than once as folks have suddenly realized, “Whoa, someone restored a bunch of databases onto a production server, and we forgot to get rid of them. They’ve just been making our maintenance windows take longer every night.” My favorites are when we’ve found AdventureWorks on production servers.

Those server numbers may not look big – we could have easily processed 156 servers’ worth of diagnostic data with a conventional Windows app running in a VM – but check out these hosting costs:

We spent ~$650 last month on hosting, and is projected to come in around $1k this month. That includes development environments, by the way. This is inexpensive enough that I could afford to absorb it for even just a handful of customers if it hadn’t caught on.

As more customers start to diagnose more servers, serverless really pays off. When new customers join, they seem to wait a day or two to set up collection, then they set up collection for just 1-2 servers, see the results, and then suddenly go, “Whoa, I should add a few other servers in too.” We added 49 new customers yesterday, so I wouldn’t be surprised if we were monitoring a total of around 300-500 servers by the end of the week.

In preparation for that, Richie’s been putting a lot of work into tuning data ingestion and processing the rules quickly, staying ahead of the Lambda function timeouts and the growing data set.

As we go, we’re learning stuff, tweaking the system, and figuring out what things users can fix on their own and which ones they need more help with. It’s even driving the blog posts I write – for example, when I see a problem on several servers in a row, and I have the same discussion with customers about it a few times, that means I need to write a post to link folks to. The results are posts like “But I don’t need to back up that database” and Why Multiple Plans for One Query are Bad.

What’s on the roadmap next

Now that we’ve got the MVP out, we’ve been working on:

Quality checking on the automated emails – we’re pretty close to the point where we’ll start letting the automated emails go straight out, but then once users have fixed the first round of problems (like no backups, obvious server misconfigurations, etc) then the human intervention will kick in later. Right now, with ~50 active paying customers and ~150 SQL Servers, it’s easy enough that I can still keep an eye on this manually, but we’ll be switching over soon.
Back-patting rules – tying together a recommendation we made, a customer’s successful application of that recommendation, and the difference it made in health or performance metrics.
Not breaking the build – seriously, I have a nearly 100% failure rate on my commits. Richie has to be gritting his teeth by now every time he sees one of my pull requests.
Wait stats trending – right now I’m manually trending wait stats, then emailing customers an analysis. We’ll need something more scalable as we go to 1,000 servers and beyond. When you build an MVP, though, you gotta do things that don’t scale.

Read more of my SQL ConstantCare posts, or sign up now.

Building SQL ConstantCare®: What Data We Collect (and What We Avoid)

Last Updated April 10, 2018

Next up in my series of behind-the-scenes posts about our new SQL ConstantCare®, let’s talk about the kinds of SQL Server diagnostic data we’re gathering, why, and what data we’re steering clear of for now. (Psst – read through to the end for a huge, crazy discount today, too.)

We – you and I – are paranoid.

We’re a data professionals, first and foremost. During my consulting intro calls, I’m really proud to say that we don’t need to get a VPN connection, remote access, or a SQL Server account. You know how it is – DBA, Default Blame Acceptor – and these servers are already in bad shape. The last thing folks want is some consultant going rogue and “fixing” something, so we make it clear from the get-go that won’t happen.

Because we’re so paranoid, my first thought when deciding to collect diagnostic data was, “I don’t wanna end up in the news.” I bet you feel the same way. Therefore, we thought about how we could give users as much valuable insight as possible from as little data as possible.

I drew a big, clear box around queries, query plans, and statistics and said, “We’re not going here in v1.” That data can produce awesome performance recommendations, but it often contains personally identifiable information. I know some people are willing to share that data on a plan-by-plan basis – the referral-marketing success of PasteThePlan proves it. I even bet a lot of people are even willing to share a lot of their plans on a regular basis as long as they’re getting valuable performance advice. However, I think we can still provide a lot of value without having that data, and we have plenty of Lean Startup lessons to learn along the way first. For now, rather than automating that part of the analysis, we simply tell customers in their advice, “If you’re able to tune queries or indexes to fix this, here’s the exact parameters to use with sp_BlitzCache and sp_BlitzIndex to find the root cause of this problem.”

We started with the only data to run the most urgent sp_Blitz checks.

To pick the data we’d gather at first, Richie took the list of sp_Blitz checks from priority 1 to 100 – the most urgent stuff – and came up with the list of data requirements to be able to send emails like this:

You’d be surprised how many real SQL ConstantCare emails start like this.

To see the queries we run to get that data, look at the queries ConstantCare.exe runs by going to Windows Explorer and opening the folder:

%localappdata%\ConstantCare\current\Resources\Queries\

1	%localappdata%\ConstantCare\current\Resources\Queries\

The way it works is that ConstantCare.exe connects to the SQL Server, runs these queries, writes the results to an encrypted zip file, and uploads it to a private bucket in Amazon S3.

At this point, you might be thinking, “Why not just run sp_Blitz on the server and harvest the results?” We wanted to design checks that worked with data over time – for example, after we find auto-shrink enabled, and after you turn it back off, how are your wait stats affected? (And yes, we’ve already found over a dozen servers with auto-shrink enabled, hahaha.)

Exposing the queries directly to customers also has a side benefit: later this year, they’ll be able to save override versions of the queries. This way, if there are some parts of the diagnostic data that you don’t wanna send in, you can override that portion of the query (returning nulls, or masked data, or just not return any rows at all.) This presents a little more design and support work for us since missing rows might trigger a false alarm – for example, if no backup data comes in, we don’t wanna say, “OMG, you aren’t backing up your databases!”

Serverless architecture means per-database checks are harder.

We hit a few challenges early:

In the past, we’ve used sp_MSforeachdb, but that doesn’t work in Azure SQL DB (and we wanted eventual compatibility with that, although not necessarily in v1)
People have a crazy number of databases per server (think thousands)
People who have a crazy number of databases also seem to have a crazy number of indexes per database (think thousands per database, times thousands of servers)
And they also have a crazy amount of backup job executions (especially if they took my advice about backup frequency)

Moving large files around in the cloud isn’t a big deal, but processing them is, especially with serverless. AWS Lambda functions have a 5-minute timeout, and we use Lambda functions for stuff like:

Importing the client’s JSON data into the database – in the first version of the intake process, we were processing incoming files line-by-line – and hoowee, did that not work with the crazy-number scenarios. After hitting timeout walls, Richie later changed it to pull the whole JSON file into the database as a set, plus we held the index analysis back for a later version.
Analyzing the data to build email content – just like your homegrown queries look at SQL Server’s DMVs to figure out what’s broken, we’re using Lambda functions to run database queries to look at your DMV data. It’s just that the data is now stored in a central database on our end. Imagine a database server that holds DMV data from lots of customers – we’re gonna run into performance issues as we gain clients, and retain more database history. If your homegrown query takes over 5 minutes to run, you’re the only one who cares – but if our AWS Lambda functions time out over 5 minutes, you won’t get your report, and I care about that.

Our SQL ConstantCare Slack channel before performance tuning

Watching function runtimes and timeouts has probably been one of the more interesting challenges so far. We have a company Slack channel for SQL ConstantCare® where status updates go and we talk about what we’re doing about ’em. We laugh a lot in there. Also, swearing and gifs.

The whole mission and design of SQL ConstantCare® means these timeout errors are less of a concern – it’s a mentoring product, not a real time monitoring product, and users don’t expect to get an email instantly after their data is sent in. (Plus, data is sent in daily, but for now, analysis & mentoring emails only go out weekly.) The database size is less of a concern, too – during the initial Minimum Viable Product (MVP) phase of the project, we could even delete and redeploy the database from scratch whenever we want, and the product still worked fine.

Read more of my SQL ConstantCare posts, or sign up now.

[Video] Office Hours 2018/04/01 with guest host Nigel Foulkes-Nock

Last Updated April 1, 2018

Brent Ozar Unlimited Team

Humor

Today’s Office Hours is brought to you by NigelDBA.com:

Office Hours - April 1st 2018

Watch this video on YouTube

Nicely done, @nigelDBA. Nicely done.

[Video] Office Hours 2018/3/28 (With Transcriptions)

Last Updated March 31, 2018

Videos

This week, Erik and Richie discuss what to do first at a new job that never had a DBA before, basic vs advance DBA skills, tempdb files, downgrading SQL server versions, a database architecture issue, tools for monitoring SQL Server instances, other areas to focus on as a DBA besides HADR and query tuning, SQL Constant Care®, and database backups.

Here’s the video on YouTube:

Office Hours Webcast - 2018/3/28

Watch this video on YouTube

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 3-28-18

What should you do on your first day on the job?

Erik Darling: Let’s see, I’m going to try this name, but I’m not going to do well at it, Seybou – I like that you have B-O-U in your name; that’s Brent Ozar Unlimited – asks, “What should you do first when you start a new job at a company that never had a DBA before?” My man, backups, backups, backups, backups, backups, backups – make sure that the backups are in order before you do a damn other thing. If you don’t have those, like, it doesn’t matter if you’re checking for CHECKDB, checking for corruption, it doesn’t matter what else you’re doing in your life, if you don’t have those backups, you are missing the biggest piece of the DBA puzzle. What do you think, Richie?

Richie Rump: Oh no, that’s exactly right. So in fact, we don’t even do consulting gigs unless we have backups from the client. And we actually go in and check and make sure that they have valid backups for everything. And there’s been times where we’ve actually cancelled gigs because they haven’t had valid backups. So we’re practicing what we preach; that’s all I’m saying.

Erik Darling: Not even like, you know, the situation where it’s like, they’ve never taken a backup. It’s like, if their backups are like more than a week or ten days old, we’re still like, you have to take one like now because we don’t know what was in that – we don’t know what was going on the last time you did anything. I feel like a nerd now, everyone can see my EpiPen. I went to get my allergy shot yesterday and I have to bring this thing with me to get my allergy shot, or else they won’t give it to me because I might, I guess, die in the office.

Richie Rump: That kind of reminds me of the old bear skit form SNL, you know, that’s my fourth heart attack this week.

Erik Darling: That was a good one. That was, like, one of the last funny things SNL ever did.

Richie Rump: No, no, no they’ve been doing some pretty good stuff recently. I haven’t been watching it live but I’ve been watching it on digital and they – Bill Hayder was on this week and he broke like five times; it was great. He just kept breaking. He was like, “I’m not a cast member anymore. I don’t have to worry about my job. I’m going to go ahead and break.” Cracking up in the middle of his stuff.

Erik Darling: I don’t know, unless he’s trying to come back on SNL. But to get back to Seybou’s question a little bit, so after backups, of course, CHECKDB and making sure you have corruption checks running, after that, everything else is kind of less important as business priorities. So make sure you you’re meeting RPO and RTO goals, then after that, make sure you’re meeting whatever SLAs you have with customers, and then after that you can start focusing on stuff like performance, whether it’s query or index tuning.

Please, feel free – I mean, it’s free, give sp_Blitz a run. Head over to our first responder kit, go grab a recent copy of sp_Blitz, give it a run, see if there’s any surprises. Best way to do things, I think; it’s neatly packaged up for you.

Richie Rump: We have this new product that we’ve just been slowly rolling out called ConstantCare. And ConstantCare will actually go in and do all of that stuff for you. It will go in – send us the data every day and we’ll send you an email. Right now we’re once a week; eventually, it will be once a day and it will actually tell you, “Hey, this one backup didn’t run,” or it will tell you go do this or go do that. So that is an option and you don’t need to worry about running scripts or anything like that, you do one install and it does it all for you. So obviously, it’s something near and dear to my heart, because you’re my life, but yes, since we’re not getting the word out, I can now talk about it.

Erik Darling: Finally, like a think that you’ve been working on almost since you first started here and everyone’s just like, what the hell is Richie doing? Well, now we know.

Richie Rump: What is he doing?

Erik Darling: Just Paste The Plan, really? That’s it?

Richie Rump: I know, and he hasn’t done anything in like a year, what’s up with that?

Erik Darling: And it barely works – just kidding, it works fine.

What separates a junior vs senior DBA?

Erik Darling: Anyway, let’s see, “I’m currently a BI developer but I’m interested in becoming a DBA as a future goal. What aspects of DBA do you consider basic versus advanced, i.e. DBA versus senior DBA? For example, would you put replication HADR in the basic or advanced group?” Well, there’s a basic and advanced level of knowledge with anything, so what I would do is forget about replication immediately. Don’t pursue that. it’s really not the future – it’s not an HADR feature, for one and it’s not really the direction that anyone’s going to be heading in if they need HADR. As SQL Server progresses and Availability Groups, and now basic Availability Groups on Standard Edition get, you know, more robust and less brittle, that is going to be the pretty obvious, you know, the HADR solution for just about everyone moving forward.

But like I said, there’s a basic and advanced level of knowledge with anything, you know. Knowing how to create an index is a whole lot different from knowing how to tune an index. The same thing goes for query writing, you know, and knowing how to write a query is a lot different form knowing how to tune a query. So it really depends where you see yourself as a DBA.

The days of the Swiss Army Knife DBA are slowly drying up. If you want to really be senior at anything, you have to specialize, like a son of a gun, and you have to get really good at something. There are just simply too many products, too many features and too many quirks and too many other things for someone to really have a senior level of knowledge across the board. I’m sure there are some people who can get that, but they’re people who have had those building blocks for years now. Like if you take someone like Bob Ward, who will be able to troubleshoot an Availability Group, tune a query, figure out indexes, you know, run a debugger, do stuff in extended events. There’s obviously the renaissance men out there who can do all that, but one of my absolute heroes in the world, Paul White, probably knows little about Availability Groups but knows the optimizer intimate.

So if you want to be a senior DBA, you have to specialize, you have to figure out what you’re passionate about as a DBA. If it’s the HADR stuff then you have your direction there. If it’s the query and index tuning then that’s your direction for that.

Richie Rump: Yeah, and now, since this is a future goal and you’re looking into becoming a DBA, then you still need to understand the concepts so you could actually talk intelligently about them. So if you’re going for a job and you’re not focusing on replication but they’re doing replication, you at least want to have a conversation about that and understand why replication isn’t the future and why this other technology is. So, you know, don’t avoid it but don’t go in depth if it doesn’t interest you.

Erik Darling: Obviously, you know, DBAs need a certain level of knowledge about little things, even if they’re not going to specialize in it. Like, if you’re going to call yourself a DBA, you should at least know how to take a backup and do a restore without Googling too much. But, you know, if that’s not your primary goal as a DBA – if your goal is not the backup and restore infrastructure person then obviously you don’t need to know every single in and out about buffer counts and max transfer sizes and all that other stuff. So really pick the route that you want to take as a DBA, or, you know, it seems like a pretty solid decision as a BI developer now. Take your time in figuring out what you really like about the DBA role and then pursue that in a seniorly way, I guess. I don’t know how else to say that.

Richie Rump: No, I think it’s good.

Do I need more than one TempDB data file?

Erik Darling: Good, I got passing grades from Richie so I’m ready to ask the next one. Steve Malcolm, who we’ve been talking to via email, is still confused about tempdb files. Yes, Steve, you need your NDF files, those are secondary data files. Since this is tempdb, you’re going to want to have more than one data file.

Steve asks – Steve sends through an email yesterday to us asking why he had multiple tempdb log files, but it turned out – we’ll get to it in a second, but it turned out that they were both data files. Now, with 2016 and 2017 SQL Server setups, something that’s changed from prior versions of SQL Server, which used to just give you the one tempdb file, is during the setup process there’s a screen that comes up that looks at how many CPUs you have in your system and it gives you one tempdb file per core up to, I think, like eight, then it stops there.

Then the bottom line of that is, like, the link I sent you over about configuring tempdb yesterday, configuring multiple data files on tempdb is generally a wise thing to do. SQL Server can use those files in a parallel manner, it can write to a whole bunch of them at once, which can reduce contention and all sorts of other stuff if you have a busy tempdb. The nice thing about 2016 and 2017 is they made the behavior of trace flags 1117 and 1118 the standard; that’s the default behavior now so you don’t have to turn those on anymore. No, you don’t. Alright, let me catch my breath for a second here.

Richie Rump: I don’t know whatever you just said, it sounded good to me, but, you know, again, I’m not [crosstalk] that that guy is… But then again, I couldn’t get, you know, you to deploy a serverless application either, so.

Erik Darling: No, no and then, you know, that’s the wonderful thing is that Brent can find people who can do the little bibs and bobs of exactly what he needs. He has you if he needs to do some crazy serverless development task that I would have no hope of ever getting past, like, figuring out what to do for step one on. He has me and Tara for the consulting and the DBA type stuff. He’s got things pretty solid; at least I hope he does, anyway. If he doesn’t, I don’t know…

Richie Rump: What do we need Brent for? I think a lot of things, actually.

Erik Darling: He’s like dad; we bring him along because he buys stuff, right.

Richie Rump: Yeah, yeah that’s right, yeah.

Erik Darling: Pays for the movie tickets – he’s a nice guy. He’s fun – he’s fun.

I need to downgrade Enterprise to Standard…

Erik Darling: Let’s see – oh boy, Doug says, “I have a situation…” Not our old Doug. “I have a situation where I need to downgrade SQL 2016 Enterprise to Standard. If I maintain copies of master, model and MSDB and after uninstalling…” Oh boy, jeez. You know, I’ve never had to go through that process. I would highly, strongly, firmly suggest that you try that somewhere because I don’t even know that you would be able to use those copies.

Richie Rump: But it’s the same version, it’s just different editions. I would – yeah, definitely test it out. I’m thinking it probably would work. I don’t know.

Erik Darling: I would want to test that out. I would be concerned about all sorts of things with that. I wouldn’t want to rely on that for a downgrade.

Richie Rump: The first thing that came to mind is that you would just move all your objects over, SSIS or something, you would move the data. But it’s the same version, just different editions, so maybe…

Erik Darling: Maybe, but here’s what I would want to do instead. I would want to side-install. I wouldn’t want to uninstall the first one completely. I would want to side-install Standard edition. And this is actually – a good use for PowerShell is the dbatools.io people have written all sorts of command lets to like copy settings and stuff from one server to another. It’s not quite desired state configuration, but it’s a good – like, you know, if you need to like migrate a bunch of stuff over all at once, it might be a good option. There are a whole bunch of command lets over there that can do that and other cool things.

So head over to dbatools.io, side-install Standard Edition and then script out whatever stuff you want to move over to there. I wouldn’t rely on the uninstall-reinstall and like either have SQL 2016 pick up on the new files or try to like restore stuff over. That just sounds like a nightmare to me. If you do that, you have absolutely no back-out clause. You are stuck. You are hosed with whatever setup you get yourself into. If you do the side install and you just script out the stuff that you need to move over, I think that puts you in a much safer place, even if you have to…

Richie Rump: And if this is a virtual machine, then just spin up a new virtual machine and just do it that way. But wow, good luck because that sounds like a rough one.

Erik Darling: Yeah, that’s not fun at all; the stuff that does not make me want to have a real job ever again.

Erik Darling: Alright, Pablo asks an interesting architectural question. “I have a view that gets three billion rows to make queries there.” I don’t know what that means exactly. “Those rows are across 40 plus databases making unions. Is this structure okay? May I have less databases or just one to avoid the unions?” This is generally the kind of thing that one would want to avoid.

Richie Rump: Yeah, we’d consider that not a best practice.

Erik Darling: Yeah, it’s a little bit more than one could diagnose without looking at things, you know, in a 30-minute free webcast, but generally that’s not a design pattern that we would want to put folks into. If you’re doing like the one database per client thing, I’m totally on board with that. I think that’s a much better pattern than the giant database teaming with everybody even if they’re separated by schema or whatever other crazy things. I think that’s a much smarter way to do things.

If you really need to query data across all the databases, perhaps like a database that you pull data from rather than having to execute queries across all of them at once would make more sense. Like, you could have just a data-dump database and you could have SSIS just move stuff in incrementally during the course of whatever day or period you need. Then again, if what you have written isn’t slow or just not causing any problems, don’t change a thing until it causes problems.

Richie Rump: Yeah, when I think about joining three billion rows in a union…

Erik Darling: That might slow things down.

Richie Rump: yeah, I don’t know what you’re trying to do, but trying to do that with a view would give me the caution to say, wait, what are we trying to do, for one, and is there a better way of doing it than in a view? Maybe there’s a different way we could go about this. I’m assuming that because you have 40 different databases, it’s the same database but all for different customers and those are tough architectures, especially when you’re trying to join them all together and do data off of them.

Typically, you want to put them all into one database, but there’s different solutions for different things that you’re trying to do. So three billion rows in all different database and trying to get them synced together and all that, that’s tough as well – so that’s a tough one but yeah, it all depends on what you’re trying to do and what data you’re trying to [inaudible] from and what information.

What do you use to monitor SQL Server?

Erik Darling: Cool, alright, Sree asks a question. And I swear to god, this is a question, not a plant so that we can talk about ConstantCare again. It is in the transcript. I’m looking at it right now. My name is not Sree and as far as I know, neither is Richie’s. “What kind of monitoring tools do you use to monitor SQL Server instances. Do you have any thoughts on the Grafana Dashboard and alerting?” Well, no I don’t and it’s not because I think it’s bad, it’s just because I’ve never used it or seen it. I don’t really know.

So Brent’s been posting some great posts, Brent, lately about our pseudo monitoring tool. But like he says, it’s more of a mentoring tool. So you have, you know, a good set of vendors out there; SentryOne, Quest Spotlight, people like that who have a cool monitoring tool with a good dashboard that gives you pretty similar metrics across a whole bunch of different areas in SQL Server. But the direction we went is a little bit different because with any monitoring tool, you need to A – figure out how to use the monitoring tool, figure out how to dig into it deeper and then learn how to solve the problems that are causing the monitoring tool to show you to these things.

The direction we’re going with it is, like Brent calls it, a mentoring tool. So what we’re going to do is we’re going to get all the data that a monitoring tool would, we’re going to analyze it, we’re going to do that work for you and then we’re going to send you an email that tells you which buttons to push and which things to do to try to solve those problems. So it’s a little bit different take on monitoring. There’s no dashboard involved. There is no – you don’t get to, you know, look at uptime, downtime, green lights, red lights, you know, charts and graphs and all that stuff. But you also don’t have to learn how to read all those charts and graphs and do all the work behind the scenes.

Richie Rump: Yeah, some of those metrics always made my eyes just cross. Oh, it’s red, is that bad? Is that really a problem? Having all those wait stats, is that a problem? What does that actually do? And then you have to figure out, okay, well what does it actually do? And then you have to do more investigation – is that actually a problem for our server?

Erik Darling: A lot of monitoring tools will just show you whatever is high. So it’s like, if some completely meaningless metric just happens to spike up, like buffer cache hit ratio or like, I don’t know, what’s another weird one? I don’t know, stuff like that. Stuff just pops up and it’s like, oh my god, freak-out. And you’re like, what does that mean? And you’re like, I don’t know, is it bad? How many page lets do I have? I don’t know, what’s going on? You just get so confused staring at all these different numbers that are just changing all the time and you have no idea if they’re good, bad or ugly.

Richie Rump: Yeah, and you know, maybe you had all these extremely high page splits and, oh my gosh, we have a problem, and yet you did a huge load into the database the day before and now you’re like, oh I’ve got a problem. Oh, but wait, there was a load that went in…

Erik Darling: Fun fact, SQL Server counts new page creations as a page split; ha-ha. No one’s looking into that. You insert a million rows, it’s like, oh no, I have a million page splits, dear god, what am I going to do? Funny…

What areas should a DBA focus on?

Erik Darling: Chris asks kind of an interesting question, “What are other areas to focus on as a DBA beside HADR and query tuning?” Well, I guess dev ops might be a thing.

Richie Rump: Starting to.

Erik Darling: I know thankfully little about dev ops so far, so I’ll let Richie expand on that one a little bit.

Richie Rump: Oh thanks, sir, so much. I’ve worked on a couple of dev ops teams. Essentially, it’s the idea of…

Erik Darling: Are you a script-master?

Richie Rump: I am a certified script-master, oddly enough. I have my Agile certification, which means I could read, kind of. It was the dumbest test ever. It was like, the answers were in the questions and it’s like, you think some Microsoft tests are bad, woo, certified script master was even worse. But essentially, it’s the idea of you’re merging your development and your operations together. So as opposed to throwing things over a wall and letting the DBAs handle it, the DBA works very closely with his own team, or is usually on the development team and they kind of work together for these things.

So your dev ops would be, kind of, a merge of a database developer and a DBA kind of into one bundle. They typically would know lots of scripting languages and, kind of, handle things together. They do things like dashboards and monitoring and make sure everybody understands the state of the system and things like that. So yeah, I mean, it’s an option of a way to go. Someone like myself, I could be considered a database developer. I consider myself a data-veloper; you have to pay me if you want to use that. but essentially…

Erik Darling: Copyright it.

Richie Rump: Exactly, so I use [crosstalk]… But essentially, I stand in the middle between the development team and the data team and I make sure all that stuff in the middle kind of goes very fast, whether it’s using ORMs or software or whatever that is. So I kind of have to do both the software and the development side. Not a lot of us guys around, so, I hear it’s a growth industry. You may want to look into that.

Erik Darling: Yeah, software is a big thing apparently; I never would have noticed.

Richie Rump: You know, not a lot of us software developers have opinions. I need you to know that. we’re really easy-going guys. Oh, you like that? Oh, I like Visual Basic, isn’t that great?

Erik Darling: Yeah, let’s high-five and hug and let’s like talk about it civilly over lunch. Let’s not talk about, like, the weird version to version issues that we have.

Richie Rump: yeah, let’s meet on Stack Overflow. I’ve got nothing but great things to say about your question.

Erik Darling: Yeah, it’ll be fun. I love the way you handle nulls. How fast can you serialize that array? I don’t know…

Do I need ConstantCare if I have a monitoring app?

Erik Darling: Alright, Daryl asks a sort of follow up question to the monitoring thing. He says, “I already have a monitoring service 24/7. What would be the point of ConstantCare? Would I want to keep my 24/7 service?” It’s not a replacement for a 24/7 service. This is something that collects data once a day, but right now it checks in with you once a week. Like Richie said, it will eventually move to once a day, where we look at your server, we try to see what’s good, bad, and ugly about your server and we send you an email so you know – and like, we teach your – we have like training videos and stuff so, like, we send you an email. We say, hey, this is what we saw on your server, these are the things going on, these are the things we think you should fix. Here’s how to fix it with, like, you know, links to training videos, blah, blah, blah; stuff like that. So it’s not a replacement for a 24/7 service. It’s going to give you, like, up, down, servers on fire type stuff. What it is, is it’s going to take a look at a server over time and figure out if it’s doing better or worse and how you can make it do better.

Richie Rump: Yeah, I like the way he put it as, it’s a mentoring tool. And some of the feedback we’ve gotten from some people, it’s like, I’m a solo DBA and it’s like I have a co-worker sitting next to me, you know, so I can kind of throw ideas off of because you have a certain level of access to us as well through the service. It’s new, it’s different. I would suggest reading the post that Brent is putting on the site every Monday. I think there’s three of them out already. I’ve been way deep in the code, so even when I read some of that stuff, it’s like, oh okay, yeah that’s right, and I’ve been working on it for a year. So if you want to talk more about this service and development and what it takes and all that stuff then I’m your guy. But if you want to know about what it can do for you and things like that, those posts are phenomenal.

Erik Darling: There’s a whole bunch of new ones coming out.

Richie Rump: Every Monday, I think, they’re going to be dropping, right.

Erik Darling: yeah, for a wile anyway.

If I need to shrink my database…

Erik Darling: So we’re going to finish up, Tom asks, “Is it okay to do a Full Backup then switch the database to recovery mode simple…” You’re lucky Tara’s not here to hear you say recovery mode, “Run DBCC SHRINKFILE and then switch the recovery model back to full?” bad news, you won’t be in full recover model again until you take a full backup after you switch it back to full, so I don’t know. Don’t shrink your databases, just please stop doing it. It’s a bad idea. It’s not a good idea. Like, since 2009 it’s been a known bad idea. Don’t so it anymore. Please stop.

Can I use SETUP to change editions?

Richie Rump: Actually, before we drop, Stephanie had a suggestion saying that, “Setup.EXE has an option to downgrade editions,” which I did not know.

Erik Darling: I’ve never seen that either. Congratulations Stephanie.

Richie Rump: So maybe it’s as easy as just running setup.EXE again and just say downgrade this sucker.

Erik Darling: There you go. Ooh, but that would be interesting if they’re using Enterprise features. So if you were using TDE, you wouldn’t be able to downgrade.

Richie Rump: Yeah, TDE or – well, I guess now even if they’re using partitions, but that’s free now too, right?

Erik Darling: SP1, yeah, 2016 SP1. But there’s a whole bunch of stuff that’s still not, like, available to everyone. Alright, well we should go now; get back to work. Richie has cloud stuff to do. I think I have other things to do too, so…

Richie Rump: I’ve got database tuning.

Erik Darling: Wahoo, you do more of that than I do these days.

Richie Rump: Yeah, it’s all the bad code that I’ve written.

Erik Darling: Ah well, someone’s got to write it. Alright folks, thanks for showing up. We will see you next week; goodbye.

“But I don’t need to back up that database.”

Last Updated May 9, 2018

Backup and Recovery

You’ve got this database that you swear doesn’t need to be backed up, like:

Read-only databases
Schema-only databases with no data in ’em
Archive databases that haven’t changed in years
Maybe even system databases like master/model/msdb

And you think you’re going to save some time in your maintenance windows by skipping backups and corruption checks on those databases.

Lemme tell you a few stories. Everything’s been abstracted to protect the not-so-innocent.

The Successful Failed Deployment – RetailCo needed a software package to manage their inventory, so they bought one from SoftwareInc. RetailCo set up the server and did the backups, but SoftwareInc’s installer absolutely required SA permissions, so they got it. Over time, one of SoftwareInc’s deployment scripts had a bug: it didn’t change the default database, so they accidentally deployed some of their tables in the master database. The app continued to work perfectly – but real user data was going into master. RetailCo wasn’t backing up the master database, figuring they’d just move the user databases over to another server. You can guess what happened when the server crashed.

The Mostly Read-Only Database – The data warehouse archiving strategy involved one database per year with databases going back over a decade. The older databases were marked read-only, so the DBA figured he didn’t need to back them up often. After a migration to a set of brand-new, way-faster SQL Servers, user complaints started coming in about incorrect query results. Turns out each database had views with tax logic in it, and every time the developers did a release, they’d alter the archive databases to make them writable, change the views, and then alter them back to read-only. No big deal – just now apply the same thing to the archive databases again, right? Well, the project manager was white as a sheet and said, “I’m afraid we need to talk about how we’ve been doing something similar to correct product categories for the accounting team.”

The Truly Read-Only Database – The DBA was absolutely positive no one could write to the archive database because he’d done a great job of locking down permissions. He backed up the monster 10TB archive database once, and then faithfully ran CHECKDB on it as often as he could – not too often, given the size. Years later, the CHECKDB failed – because after all, the SAN can still write to the database regardless, and sometimes, storage gets a mind of its own. The DBA said no problem, I’ll just restore it from backup – and then found out that backups over two years old had been deleted as part of the company’s compliance projects. (Yes, in some private companies, folks actually have an obligation to delete data to stay in compliance.)

The Priceless Schema – A SaaS company came up with a creative version management solution. Customer data lived in one database – call it DataDB – but stored procedures & views lived in another, call it LogicDB. With every release, the app built a brand new LogicDB database with all the new stored procedures, views, etc – like LogicDBv1, LogicDBv2, LogicDBv3, etc. When a SaaS customer migrated to a newer version, the support team just changed the default database for that customer’s SQL login, and presto, it started using the new objects. Seemed brilliant at the time, letting customers live on different version levels while calling the same stored procedure names. The DBA didn’t bother backing up the Logic databases – there had been hundreds built up over time, and they could all be regenerated by source control, right? And the DBA had been using a script to sync logins from production to disaster recovery, but she didn’t think about syncing their default databases, too (and the support team was constantly changing them based on what version of the app they were on.) As a result, when they failed over to DR, it was more D, less R.

Brent’s rule: on a production server,
every database is a production database.

Don’t take the initiative to cut corners around data. Your first obligations are to serve and protect the data. Start there first, not with “time-saving” or “cost-cutting” ideas that don’t really accomplish either.

First Responder Kit Release: Driving Miss Data

Last Updated May 9, 2018

First Responder Kit Updates

Ugh. Bungabase slow.

When Grog try count meat, Glug eat meat.

Then Grog get hungry and Glug try count meat.

Bungalock.

Need big head make bungabase fast.

You big head?

You can download the updated FirstResponderKit.zip here.

sp_Blitz Improvements

#1459: A while back, Brent blogged about implicit transactions. In the comments, a smart commenter named Dan pointed out a way to find that with a couple DMVs. If you’d like that information to be available in more places, please consider voting for my Connect item. This check was also added in different ways to BlitzWho and BlitzFirst.
#1483: Add CPU speed to the power settings check.
#1481: Brent made stuff compatible with Managed Instances. Hopefully that holds up.
#1465: Someone had a problem in German. This is an infrequent occurrence.

sp_BlitzCache Improvements

#1516: Added support for correctly displaying Unicode database names ?_?
#1489: We had a whole heck of a lot of checks running. It’s not that they weren’t useful, it’s just that they created a lot of noise, and may not all have been immediately actionable. All that XML parsing was starting to slow things down. I took a bunch of them and changed the code so they only execute if you run it with @ExpertMode = 1.

sp_BlitzFirst Improvements

#1459: See note in sp_Blitz

sp_BlitzIndex Improvements

#1513: Added support for correctly displaying Unicode database names ?_?
#1509: The column names column in the #statistics table is now NVARCHAR(MAX). This was reported to us in the Slack channel! Thanks to Sanket for letting us know.

sp_BlitzWho Improvements

#1459: See note in sp_Blitz

sp_DatabaseRestore Improvements

Nothing this time around

sp_BlitzBackups Improvements

Nothing this time around

sp_BlitzQueryStore Improvements

Nothing this time around

sp_AllNightLog and sp_AllNightLog_Setup Improvements

Nothing this time around

sp_foreachdb Improvements

Nothing this time around

PowerBI

Nothing this time around

sp_BlitzLock

#1500: Column Store deadlocks generate slightly different XML. We now account for those.

#1469: Fixes an issue where come deadlocks were only counted once. Thanks to @HolisticDeveloper!

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs, heh.
When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

You can download the updated FirstResponderKit.zip here.

Hash Join Memory Grant Factors

Last Updated January 8, 2019

Execution Plans, Memory Grants

Buskets

Much like Sorts, Hash Joins require some amount of memory to operate efficiently — without spilling, or spilling too much.

And to a similar degree, the number of rows and columns passed to the Hashing operator matter where the memory grant is concerned. This doesn’t mean Hashing is bad, but you may need to take some extra steps when tuning queries that use them.

The reasons are pretty obvious when you think about the context of a Hash operation, whether it’s a join or aggregation.

All rows from the build side have to arrive at the operator (in parallel plans, usually after a bitmap filter)
The hashing function gets applied to join or grouping columns
In a join, the hashed values from the build side probe hashed values from the outer side
In some cases, the actual values need to be checked as a residual

During all that nonsense, all the columns that you SELECT get dragged along for the ride.

Here’s a quick example!

This query doesn’t return any rows, because Jon Skeet hadn’t hit 1 million rep in the data dump I’m using (Stack Overflow 2010).

/*Returns nothing*/
SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 1000000;

/*Returns nothing*/

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 1000000;

Despite that, the memory asks for about 7 MB of memory to run. This seems to be the lowest memory grant I could get the optimizer to ask for

If we drop the Reputation filter down a bit so some rows get returned, the memory grant stays the same.

SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 500000;

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 500000;

That’s why I’m calling 7MB the “base” grant here — that, and if I drop the Reputation filter lower to allow more people in, the grant will go up.

SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 400000;

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 400000;

But we can also get a grant higher than the base by requesting more columns.

SELECT u.DisplayName, p.Title
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 500000;

SELECT u.DisplayName, p.Title

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 500000;

This is more easily accomplished by selecting string data. Again, just like with Sorts, we don’t need to actually sort by string data for the memory grant to go up. We just need to make it pass through a memory consuming operator.

Thanks for reading!

Brent says: you remember how, in the beginning of your career, some old crusty DBA told you to avoid SELECT *? Turns out they were right.

Office Hours Guest Instructor Month

Last Updated March 22, 2018

Company News

In our weekly Office Hours, we hang out and answer questions from you, dear reader.

In April, we’re trying something new: guest hosts. If you’ve got a topical question or something you want to ask a specific person, now’s your chance:

April 4th – let’s talk about performance tuning with @PinalDave
April 11th – ask high availability and DR questions with @EdwinMSarmiento
April 18th – get your PowerShell on with Drew Furgiuele (@Pittfurg)
April 25th – talking getting data in & out of SQL Server with BimlHero @AndyLeonard

Missing Index Impact and Join Type

Last Updated March 22, 2018

Indexing

Just Another Way

No matter how you delve into missing index requests — whether it’s the plan level, DMV analysis, or (forgive me for saying it), DTA, the requests will generally be the same.

They’ll prioritize equality predicates, the columns may or not may be in the right order, the columns may or may not be in the right part of the index, and the impact…

Oh, that impact.

It’s all just a cry for help, anyway.

Like a teenager watching anime and buying intricate parasols.

Salted Grains

If I run these three queries with different join types:

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(LOOP JOIN, MAXDOP 1, RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(MERGE JOIN, MAXDOP 1, RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(HASH JOIN, MAXDOP 1, RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(LOOP JOIN, MAXDOP 1, RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(MERGE JOIN, MAXDOP 1, RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(HASH JOIN, MAXDOP 1, RECOMPILE)

They’re all going to ask for the same missing index:

USE [StackOverflow]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Badges] ([Date])
INCLUDE ([Name],[UserId])

USE [StackOverflow]

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[Badges] ([Date])

INCLUDE ([Name],[UserId])

Kinda weird already, that a join column is an INCLUDE, but hey.

What’s even weirder is that they have diminishing estimated impacts based on join type.

Loop: 99.1954%
Merge: 42.7795%
Hash: 28.7901%

It gets a bit stranger if I force parallelism!

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(LOOP JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(MERGE JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate
FROM dbo.Badges AS b
JOIN dbo.Posts AS p
ON b.UserId = p.OwnerUserId
WHERE b.Date = '20170101'
OPTION(HASH JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(LOOP JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(MERGE JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

SELECT b.UserId, b.Name, b.Date, p.CreationDate

FROM dbo.Badges AS b

JOIN dbo.Posts AS p

ON b.UserId = p.OwnerUserId

WHERE b.Date = '20170101'

OPTION(HASH JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE)

Now the estimated impacts look like this:

Loop: 98.9569%
Merge: 29.1982%
Hash: 44.2455%

The Hash and Merge join impacts have just about changed places.

The funny thing is…

I totally agree.

As far as indexes go, that’s a crappy index for the Merge and Hash Join plans. But no better one is being offered, not even sneakily.

For the nested loops plans, it’s super easy to grab the UserIds for that date, and dig into the Posts table for just those.

For the merge join plans, it’s less helpful. While it’s nice that we can easily filter the date predicate, we still have to order our data for the merge join. There are further complications in the parallel version.

For the hash join plans, it’s a similar situation. We need to create a hash table on UserId. It being in the leaf of an index on Date doesn’t help us much, or rather as much as it would if it were in the key.

In short, both the merge and hash join plans have cost-increasing operators thrown into the mix that the index as requested just wouldn’t help.

Thanks for reading!

Brent says: in the Mastering Index Tuning class, these types of examples are why I tell students that you should look at one-key-column index suggestions carefully. In most cases, SQL Server really needs one (or more) of the included fields to be in the key. The hard part is figuring out which one(s) without looking at the execution plans.

Building SQL ConstantCare®: Product, Packaging, and Pricing

Last Updated March 22, 2018