Blog

Training Week: Data Science Fundamentals with a Real Project

Company News
0

When Steph Locke first ran her Data Science Fundamentals with R class, I was amazed at the work she put into the hands-on labs for the students. They had real challenges to help them learn how to model different kinds of data and learn answers from it.

I immediately said to her, “Hey, I wanna give you money. You clearly know what you’re doing. I want to learn more about my own customers. I’ve got a few different piles of data, and I want to figure out who buys training from me. If I can target the right customers, then I can send less emails to people who WON’T buy training, and keep their inboxes clean. Can you help?”

Holy smokes, did she help – she gave me insights about my customers that seem obvious in hindsight, but I wouldn’t have figured out otherwise. It was just awesome.

Now, you can use our data to help you learn. Part of our agreement was that she could use my anonymized data in her own workshops. You get the same source data, and alongside Steph, you will:

  • Consolidate multiple data sources into single dataset ideal for trying to predict who will buy training.
  • Decide the right sampling strategies, so we can build predictions and test them.
  • Build different types of models to see which things influence whether someone will buy training.
  • With multiple models on our hands, which one is the best fit with reality? Use evaluation techniques to identify the model that best identifies Brent’s future customers.
  • Use the model to make some predictions to improve Brent Ozar Unlimited’s marketing strategies.

This isn’t some abstract, meaningless data set – it’s real data, solving real business problems. I know you’re gonna love this – go check it out.


Training Week: Announcing Ben Miller’s PowerShell for DBAs: Level 2

Company News
0

PowerShell for DBAs

Drew Furgiuele‘s PowerShell for DBAs class has been getting rave reviews, including:

“Fantastic class! This is the best online training I’ve ever taken. Very slick setup using the webinar for learning, an AWS VM for labs, and slack for interacting with the class. This course teaches more than just PowerShell for SQL Server – the general training on day 1 will leave you ready to tackle anything with PowerShell. Drew is a great instructor” – Grant Schulte

“I’ve been wanting to add PowerShell to my DBA toolbelt. This class was exactly what I needed. The instructor, Drew, knows his stuff and is well-known in the SQL Server community. If you want to learn PowerShell for SQL Server, start here.” – Kevin Kelso

“I really enjoyed this class! I knew very little PowerShell and was intimidated when looking at scripts I found online. I feel much more comfortable trying my hand at PowerShell after taking this class. Drew was a great instructor, and I thought the material, demos, and labs were all very good. I also appreciate that the class recordings are available – I had to miss part of the class due to some work issues.” – Lois Scarane

I could go on and on – the reviews certainly do, and if you’re interested, you should check ’em out. Drew’s next class starts July 26th, and registration is open now.

So when Ben Miller came to us and suggested doing a Level 2 course, I was hooked. Here’s what he came up with:

PowerShell for DBAs: Level 2

Do you know some PowerShell and want to take your knowledge to the next level? Do you want a consistent way to manage your servers, databases and services without clicking?

PowerShell for DBAs: Level 2In this 2-day class you will learn to:

  • Run commands on remote servers using PowerShell Remoting
  • Set trace flags and protocols on remote servers, then restart them to take effect
  • Use WMI to get information about disk space
  • Use the SQL Server provider to navigate SQL Server within a path-like structure
  • Use PowerShell to determine the existence of objects in SQL Server
  • Learn SMO and use it to manage a SQL Server and Databases. You will learn how to change configuration properties of a server, change Database Options, add space to a database file or log and many other database management tasks.

Prerequisites: attendees should have either taken the PowerShell for DBAs course, or have 6-12 months of hands-on production PowerShell experience.

About the instructor: Ben Miller is a Microsoft SQL Server MVP and Microsoft Certified Master (MCM). He’s been working with SQL Server in the field since 1997, including 7 years at Microsoft. He is passionate about SQL Server automation and integration, and uses PowerShell and SMO regularly.

Like all our classes, it includes Instant Replay so you can stream a recorded version of the class for a year after it finishes. Students have been raving about how helpful that is when they get pulled away from work for an emergency.

Head on over and check it out! Registration is open now for the June 11-12 class.


Training Week: Announcing Edwin’s New Class on Failover Clustered Instances

Clustering, Company News
2 Comments

This week, we’ve got a bunch of announcements about new training classes. Next up, Edwin Sarmiento: his 3-day Always On Availability Group class has been getting great reviews:

“This class is fantastic. There is no filler, and no needless repetition, so be prepared to pay attention the entire duration. Edwin clearly is very passionate about his craft and does an incredible job of sharing his knowledge. Great balance of theory and application. I went through 4 glitter pens taking notes. A+” – Jordan

“Enjoyed the class. It was a different approach that I found very useful. I was looking for information about how and when you would use AG. The troubleshooting aspect was very good and as someone has already said, it helped me to understand the dynamics between all the working parts.” – Paula Luther

“This session was amazing. I had previously had some experience with availability groups, but Edwin’s knowledge really helped me understand the dynamics between all the working parts. I am more confident to work with availability groups, and I fully understand many situations that I did not before.” – Michael

“The detail and pace were fantastic! Edwin’s energy and knowledge ensured a captive audience as evidenced by the level of engagement in the channel – which I also found extremely useful for asking questions or just looking for clarification. Overall, an A+ training session – I’m ready to build my first multi-site AG solution!” – Chris

Now he’s bringing out another class.

Failover Clustered Instances
Failover Clustered Instances

Always On Failover Clustered Instances:
The Senior DBA’s Field Guide

You need to build or manage a SQL Server failover clustered instance (FCI) but you’re not sure where to start. And with Always On Availability Groups as a high availability solution, it becomes even more confusing. Properly sizing the hardware? Who is responsible for what? How does the Windows Server Failover Cluster work with SQL Server?

The SQL Server FCI is not dead. Even though Always On Availability Groups were introduced in SQL Server 2012, customers are still deploying SQL Server FCIs to protect mission-critical databases. This instructor-led training class is specifically designed for senior database administrators responsible for designing, implementing and managing a SQL Server FCI.

In this live 3-day class, attendees will learn:

  • Fundamentals of Windows Server Failover Clustering (WSFC) – the underlying platform that makes SQL Server FCI possible – from the external dependencies like Active Directory and DNS to quorum, shared storage and cluster configuration
  • Designing and implementing a Windows Server Failover Cluster to meet both high availability and disaster recovery requirements
  • Designing and implementing common topologies for SQL Server FCI solutions for a single- or multi-data center deployments
  • Managing and monitoring SQL Server FCI implementations
  • Effective troubleshooting of availability issues for both the Windows Server Failover Cluster and the SQL Server FCI

About the instructor: Edwin Sarmiento is a Microsoft Data Platform MVP and Microsoft Certified Master from Ottawa, Canada specializing in high availability, disaster recovery and system infrastructures running on the Microsoft server technology stack. He is very passionate about technology but has interests in music, professional and organizational development, leadership and management matters when not working with databases.

Online class times:

Head on over and check it out. See you in class!


Performance Tuning in 21 Demos at the PASS Summit 2018

#SQLPass, Company News
0
Performance Tuning in 21 Demos

You’re a developer or DBA who wants to tune indexes and queries to make your application go faster. You don’t like learning by watching slides or reading white papers – you want to jump right into the code and see what happens. You’re the kind of person who loves watching YouTube videos that start with, “Hold my beer and watch this.”

In one demo-filled day with Brent Ozar and Erik Darling, you’ll see SQL Server, Azure SQL DB, and Azure Managed Instances make questionable lifestyle choices. The engine will wildly underestimate the work required with a query, pick the wrong indexes, and jump off the roof into the pool while holding the TV – all in an effort to deliver the right query results. We’ll show you how to coach the database engine into better performance and less problems.

Everybody will get their own USB flash drive to take home with the StackOverflow2013 database (just 50GB, so it’s easier to do the demos on smaller laptops), the demo scripts, and PDFs of the slides.

Attendees will also get 1 year of access to:

Our pre-con sold out with 360 attendees last year, and I heard from a lot of frustrated folks that couldn’t get in because they didn’t move fast enough. Don’t make that mistake again this year – go claim your spot now.

Check out the list of pre-cons, and then register quick.


[Video] Office Hours 2018/4/4 with Special Guest Pinal Dave

Videos
0

This week, Pinal Dave joins Brent, Tara, and Richie to discuss replication latency, partitioning tables, transactional replication, troubleshooting 3-rd party application performance, SQL ConstantCare® update, SQL Server errors, copying data over from a database with read-only access, reducing blocking timeouts, backups to the cloud, and more replication questions!

Here’s the video on YouTube:

Office Hours Webcast - 2018/4/4 with Special Guest Pinal Dave

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 4-4-18

 

How do you monitor replication latency?

Brent Ozar: Our first question comes in from [Tishal], [Tishal] says, “I was preparing a report using data collected from management data warehouse. I was using one of its tables, snapshots, performance counter values, and it has some columns in it formatted as floats.” He says, “How do you interpret the results of these columns for replication latency?” Tara, you’ve done a lot of replication work. When you want to find out how latent replication is, what do you use in order to do it?

Tara Kizer: I usually just open up replication monitor and then I insert a tracer token. Replication monitor will tell you what the replication is. It’s an estimation, but if you insert a tracer token, it will watch it flow from the publisher to the distributor. It will tell you what that latency is and the distributer and subscriber and what that latency is. So tracer token is a really good way. Some people, via code, they insert tracer tokens and then log that to a table and send out emails, but I figure what the script is to run to query it, but you could run an extended events session to see what replication monitor is doing to grab that latency number anyway and then monitor that via scripts instead of using the GUI.

Brent Ozar: Yeah, use your own scripts to inject a token whenever you want.

 

I run out of disk space when I create a clustered index.

Brent Ozar: Pablo says, “With a million row BI table, I’m trying to create a clustered index and I’m constantly running out of disk space. Will partitioning help?” Pinal, what do you think?

Pinal Dave: That’s a very interesting question. Actually, this reminds me of the time, years ago, when the disks were very, very expensive. I remember, I had experienced this one and I, even though there was no real reason, I had to partition my table. But the partitioning was actually just done just because of disk error. So yes, partitioning would help, but it makes sense that you create the second partition on a different drive. Same drive, you’re going to be walking into the same problem. Different drives – now you need to do a partitioning, partitioning functions as well as you need to go through the center and logic about which is the key and then you create this horizontal partitioning.

So yes, partitioning would help, but if this is not something you’re looking for, you may want to look at how you can get more space on a disk by changing the disk. Maybe a [longer run tact] would be easier, or any other thing like delete some stuff or move some of the database out of that partition. Or do something like that, that would be more meaningful instead of just doing partitioning for the sake of partitioning. And when you are creating the clustered index, the heart of your table – so that’s my opinion. So create partitioning if there is no option, otherwise, just do what works, actually. At the end of the day, your boss should say, you are able to manage your database.

Brent Ozar: That’s what I say to Richie all the time. Richie, you are able to manage your database. I was just, not even an hour ago, I was telling Richie, “Richie, I want to buy a bigger database server.” And he was like, nope.

Richie Rump: It’s not our problem, Brent. “But I want to. I want to try it.”

Brent Ozar: I want to throw money at it.

 

Is replication a good fit to copy data between servers?

Brent Ozar: J.D. asks, he wants to do – he says, “Is transactional replication the best way to copy data from a vendor app on server one to a different database on server two? It needs to be as close to real-time as possible. How should I move data between those?”

Tara Kizer: What is with all the replication questions?

Brent Ozar: It’s because you’re here, Tara.

Tara Kizer: I mean, best – I don’t know. Are you using Standard Edition, Enterprise Edition – because if I’m using Enterprise Edition, I’m not using transactional replication. I’m going to be using availability groups and, you know – it just depends what you’re trying to do here. All of the technologies have some latency, unless you’re doing synchronous through mirroring in availability groups.

Brent Ozar: He follows up with, he’s, “On Enterprise but it’s 2008R2.” It does say that they might upgrade to 2017 soon.

Tara Kizer: Yeah, so transactional replication is a good tool to copy your data to another server, but you have to be aware that there is going to be latency at times when big transactions run. Make sure your replication carpology is best practices. You’ve got a publisher on one server, a distributor on another server and your subscribers on another server. Don’t put the distributor on the publisher or subscriber.

I mean, you could, if you’ve got the hardware for it, but best practice is to separate these guys. And just know that it’s replicating inert, update and delete commands and updates get converted into a delete and an insert. So it’s at a very high level and so there can be some latency flowing through the pipeline, whereas mirroring and availability groups occur at a much lower level. But yeah, 2008 Enterprise Edition transactional replication is a good feature, it’s just I’ve spent so many hours troubleshooting it and, you know, waiting for a snapshot to run and waiting for initialization to happen. I mean, wow, a lot of hours.

Brent Ozar: If somebody offered you a new production DBA job and they said you can be a DBA but you have to work with replication fulltime, what would you say?

Tara Kizer: I don’t know – for me, taking another job, it depends upon benefits. I start there and a lot of times I don’t care how bad the job is if the benefits are good. But for me, am I going to be on call and have to work at three in the morning all the time – I don’t mind replication so much, you know. It was stable in my last environment and we didn’t have to do too much with it.

Pinal Dave: When it works, yes.

Tara Kizer: When it works, exactly, that’s the key…

Pinal Dave: So one thing I would just make a comment here is that you will find a lot of experts who can configure the replication, but the people who can just really troubleshoot the replication are, I think, there are only 1%. It’s not an easy task and, yes, configuring, click, click, click, go, done, fine, but how do you fix it?

Tara Kizer: And I’ve even opened up a case with Microsoft, many years ago, to – I was getting a weird replication error and I knew that the snapshot and that whole process was going to take several hours and so I was hoping to fix replication rather than go through that process. And the things that they had me do to replication is not something that you find blogs about. So, I had to open this case with them – and I don’t remember what the outcome was, if they fixed it or not because it was just way too many years ago. But it was just crazy, the queries I was running. And they’re like, grab this, grab that, look at this – it’s just like, how are you supposed to know how to do this stuff?

Brent Ozar: Richie, you look like there’s something you want to say in there too.

Richie Rump: I hate the Cubs.

Brent Ozar: Did they lose?

Richie Rump: They, like, haven’t scored a run in like two games, so. But that’s what I wanted to say, so…

Brent Ozar: You can always tell when baseball season starts with Richie. He’s very preoccupied – this is his one. Justin Setliff says, “Richie, don’t worry. It’s early.” It looked like a blue screen of death flag for a second there.

Richie Rump: And that’s what it feels like.

 

How do you prove other services are causing performance problems?

Brent Ozar: Nicholas asks, “Hi gang, how can you prove that other processes and services like Apache or internet information services on the same box are causing the SQL Server performance problems?”

Tara Kizer: I would never be able to prove it because I wouldn’t allow those on my SQL Server. So I mean, I need a production DBA here and I just – no, you’re not putting that stuff on my box.

Brent Ozar: And why not?

Tara Kizer: Because SQL Server is a memory hog, IS is a memory hog, I mean, they could be on the same box if you’re talking about a very, very small system, I guess. I mean, maybe in a test environment, but in production, I want to go by best practices and try to avoid having issues at three in the morning.

Pinal Dave: Right, and one of the things – yesterday, I was traveling through India and I was in Pune and my customer had an interesting scenario. They thought the reporting service is just fine and they have absolutely no issue and they put it on the same box as SQL Server. And I was like, okay, we just try to debug why they have a lot of I/O and we tried to start debugging with the various T-SQL scripts. And suddenly, the accidentally opened a task manager and we can see from the task manager’s view that reporting service is taking the maximum amount of the memory.

They were like, “I don’t know why, what was going on.” And we can see that it was taking 30% of their box’s memory and we were like, whoa. I mean, we didn’t even have to run any diagnostic script, the task manager was saying it. And as soon as we had just decided to turn it off and kill it and they had a little minor performance improvement; minor. And we believe, after they might have restarted, it would have worked. So once in a while, the layman’s, or not so smart, solution, like task manager, also tells a story. So we should definitely run all the fancier script, but do not ignore the task manager. That’s what I just told my customer yesterday and they thought, “Oh do you tell this to everybody?” I said, “All the time,” though even I said first time yesterday, yes.

Tara Kizer: I too use task manager. Sure, I know how to use other things also, but it’s so easy – right click, task manager, and there it is.

Brent Ozar: And everybody gets it, like, they understand, we’re not hiding something from you, it’s built into the system. Nicholas says about why he put things on multiple servers, he says, “Sometimes you inherit things.” Man, you’ve got to get better relatives so you don’t have such crappy inheritance.

 

Why are vendors telling me to replicate to the cloud?

Brent Ozar: Daryl says, “I got an email from a vendor yesterday telling me that I should replicate to the cloud. I thought last week that we were saying we should not do replication. I thought that Brent Ozar Unlimited was saying not do replication.” Well generally, if a vendor is telling you to do something, they’re usually also selling you a tool to do it at exactly the same time. So just think about that one.

Tara Kizer: What problem are they trying to solve here? Why are they saying to replicate to the cloud? What is the issue here? Is it disaster recovery? What is it?

Pinal Dave: I’ll just add a point there because I think I know a little bit of context of this because I have seen a similar email and I little bit read through it. I don’t know exactly situation of the Daryl, but I will just say it this way, I think Quest has a product, I think, and that product is trying to talk about hybrid replication, which is not native to the SQL Server and they say you can go to hybrid as well as heterogeneous environment. And at that time, Always On is not a solution, might be you may want to consider replication. So yes, replication is still true, but in replication not in terms of the SQL Server replication, replication in terms of taking your data and creating a replica of it at a different location. So maybe that’s what they are meaning. And the product might be able to solve the problem, as Brent clearly said.

Brent Ozar: And I’m a fan of putting data in the cloud, too. Like, if you want t reporting copy up there – because classically, a lot of our users are out in the cloud or they’re out somewhere else. And if your building’s internet connection goes down and they still want to be able to run their reports, replicating up to the cloud can make sense. You just want to make sure that you’re solving a problem that the business has.

Richie Rump: I just want to point out that Tara’s wearing a Quest sweatshirt.

Tara Kizer: Usually, my old camera would have been up to here. It’s just cold in my house and this is my favorite thing around the house.

Brent Ozar: I’m not going to lie, Quest has – the Quest velour kind of, whatever that thing is, that’s one of my favorite things that I keep over in the office as well. It’s so soft. If you ever get the chance to get, not a vendor t-shirt, because vendor t-shirts aren’t that good, but if a vendor gives you something nice like a hoodie, a sweatshirt, it’s going to be really nice. Same thing with the jackets – they give good stuff out.

Richie Rump: Yeah, I have an Azure hoodie that I got from the PDC where they announced Azure, when nobody could say Azure… I still have that and I wear it all the time.

Brent Ozar: Nicholas points out that when we were at Quest, we did shirts, Kevin Kline is Devine was one of the t-shirts and that was really good.

 

How can I fix performance in a third party app?

Brent Ozar: Pablo says, “I have this third-party application and it keeps creating lots of tables. And then, when we run reports, it does a bunch of union alls that take forever. What should I do to help this application’s performance?” Everybody has the same look…

Pinal Dave: Yeah, exactly. I just thought – I think they got us at one word, which is called third-party app. As soon as you say it, we just know what it is about. There are thousands of things we can talk about, union all is just the one thing which you can see probably. And there are so many things like indexes, statistics and hints they’re using inside the core. I’m sure there are so many views to support, so you can’t see what is written in the view, which you can just have to open it, encrypted stored procedure comes handy when you don’t want it – so many things, third-party app could have it, but I mean, this is the right place. I can tell you, there are three resources you can watch, I/O, memory and CPU. Those are the three going to be heavily consumed by this third-party app, no matter what you do. So whatever you can do at a database level, application level, without even touching the code, should be your priority when dealing with a third-party app because most of the third-party apps, as soon as you try to cut something, they somehow make sure they, for any other things that are not even related, they blame you for that.

Brent Ozar: True…

Pinal Dave: I mean, like, dropping an index and – yes, okay, can I tell a quick story before I let Brent and the team answer? Yesterday, I advised one organization who are using a product and I said why don’t you drop some of the indexes and just for trying, I said, let’s check all entire your code base and see if anywhere you are using that index hint, otherwise that would break. So they said sure. They went through the code base search. They couldn’t find that index and they decided to drop that index immediately. Then one of the major screen broke. They figured it out – they’re using now not even the index hint that way with the name of the index. They are using index hint with, like, the inter-parenthesis one, two, three, four, five. They are addressing the index not with the hint name, not index name. they are addressing them with the ID of the index, like four. So [crosstalk] you recreate that index, the fourth index would be something else. Think about what would be your performance. Third-party app can do anything.

Brent Ozar: That’s the worst hint I think I could imagine is specifying index hints by number because – say that somebody accidentally drops it, you don’t even know what it was. That’s terrible.

Tara Kizer: I’ve never even seen people do that. I didn’t even know that that was an option.

Pinal Dave: It’s an option and people practice it. And when I ask them and said why they do that, and their answer was beautiful, “Oh we come from a different RDBMS experience.”  I was like, well, saying you are from that Oracle or MySQL or Postgres does not make you any smarter.

Brent Ozar: And they were probably bad in that database platform too. It’s not an excuse. It’s like being caught with a pile of drugs and saying, “Oh I’m from France. It’s legal in France too.” It doesn’t make a difference.

 

How has the reception been for SQL ConstantCare?

Brent Ozar: Let’s see, Greg asks, “How’s the reception been for ConstantCare? I’m on day one and I love what I see so far.” Greg, it was funny to see you sign up too. I think we broke past 300 servers. I know we’re up past 100 users and I think we’re past 300 servers.

Richie Rump: Yep, something like that.

Brent Ozar: Yes, we’re up over 150TB of databases monitored. So it’s been a lot of fun. It’s been keeping me busier than a one-armed paper hanger trying to give advice to these people for servers, so it’s been pretty funny.

Pinal Dave: I have one question for you because it’s a V1. I have been reading myself – and this question is from me actually and I’m just inserting because I get the opportunity. Who writes all these things? I mean, it doesn’t look automated.

Tara Kizer: It was me.

Brent Ozar: We started with sp_Blitz. So sp_Blitz gives you all these warnings about your stuff. So I told Richie, here’s what we want to build and we’re going to take the rules from sp_Blitz, and that’s like all the input he had form me and he, like, boom, took off.

Pinal Dave: the VMware suggestion, it blew my mind. I didn’t know that. just reading that email about VMware, I learned something. I was like, oh, I think a couple of my customers in the past, I never talk about this. They might be facing this. They put complete trust in my and I didn’t know that part even. I just learned a second ago.

Brent Ozar: To me, I think that’s one of the markets too that we’re aiming for is consultants and independent freelancers because it’s so nice to have a second opinion. Just tell me what’s going on else that I might have missed. And so it’s really good to get this list of, you forgot about this one rare issue, you know, this one rare poison wait or whatever. So it’s been a lot of fun with that.

Richie Rump: [crosstalk] a blast doing all this stuff and I know he is because he gets quiet and I’m like, oh he’s just having too much fun over there.

Brent Ozar: And Richie can spell the customer emails that I send out too, like seeing out what the recommendations are. So he sees where I’m tweaking things from the system and how I go into details and it’s fun.

Pinal Dave: This is amazing because one other thing I’ll tell you – because one place you have mentioned about CUs, it was so nice to talk about CUs because I knew that one and I was so happy; a lot of people do not know. So if you install SQL Server 2016 SP1 right away, suddenly you will see your locks going crazy. Your locks are so many that you would be like, what happened? I didn’t change my code. Everything was fine in 2014. And as soon as you start updating the latest CU, suddenly all the locks will disappear and there was resource semaphore. And then, we just had to create one index and resource semaphore one, but if somebody would have not updated the CU, which the suggestion was, they don’t have ConstantCare. And I was like, I can clearly see if this guy has not done CU and four of the servers were on a default, he would be facing the locking scenario, crazy, thinking they have a bad code and that’s not the case. It’s just CU update you just have to do and your life is all good after that.

Brent Ozar: Yeah, and if you apply a bad CU, if you apply a CU that breaks things, I want to be able to tell people very quickly.

Pinal Dave: For sure, yes.

 

I’ve been getting database mail errors about spawning processes…

Brent Ozar: Let’s see, Mike asks, “I’ve been getting errors…” Oh my goodness, Mike says, “I’ve been seeing SQL Server fail to spawn a thread to process a new login in the middle of the night. Should I add logging schedulers?” he wants to start logging different DMVs, schedulers, workers, tasks, every five minutes. “What should I do? It’s a brand new server with only one database. It’s got two cores, 8GBs of RAM and SQL Server Enterprise Edition.”

Tara Kizer: Why is this server so small when you’re spending so much money on Enterprise Edition? That’s what I want to know right off the bat. 8GB of RAM? My new desktop has a lot more.

Brent Ozar: And the two cores thing is tricky too because the minimum core licensing for a VM is four cores. Now, if you’re licensing by the host, you can license the whole host and then start small, but two cores of Enterprise Edition is $14,000. I mean, it’s a big deal.

Tara Kizer: I’m not even familiar with that message. Is that a specific SQL Server message? Because that doesn’t – I’ve looked at a lot of error messages in the past 20 years or so; that’s not one of them I’m familiar with. I’m wondering if this is something, an application error, instead.

Brent Ozar: Ooh, Mike says, “It came from DBMail.” Wait, so now this really makes me suspicious of the number of messages you’re processing in terms of DBMail. That seems sketchy. We should probably ask around the question too. So my personal thought on DBMAIL is I don’t want it to email customers directly. It’s one of those things that I would use for administration type stuff, but I wouldn’t want to try to make an app server out of it. It’s okay, it just doesn’t scale that well. Mike says, “It sends to DBAs.” Yeah, that’s not so bad. So I could see that. I don’t know that I would start by logging DMVs though. I would start by looking at wait stats – look at wait stats overall and see what you’re looking at then.

Pinal Dave: Because there are only two CPUs, so when he says CPUs, I assume there are two processors, right, then he might be running out of the threads during the night due to maybe some other operations. It might be conflicting with your other maintenance jobs. Just possible, thinking, because you said midnight.

Brent Ozar: Yeah, so it was probably got 50 jobs all starting off at exactly midnight. I used to do that as a DBA. It’s the middle of the night – just set everything to midnight. Then you could see all the lights in the data center get dim right at exactly midnight…

Richie Rump: It’s like four o’clock for ConstantCare.

Pinal Dave: right, when somebody said midnight, that’s the only thing that comes to my mind because I don’t know why, when I was a beginner, 12PM was like the most holiest time to do pretty much everything; fire off the backup, fire up the index maintenance and let them run in a catch-22 situation.

Brent Ozar: the server is totally idle from like 10 PM to midnight. There’s nothing happening, and then it’s like [crosstalk] of football players go running into the thing at the same time.

Pinal Dave: Right exactly at midnight, and yes, and [they never finish]…

Tara Kizer: And just to give an extra hint, look for thread pool waits. And so what I would do, for cheap monitoring, just log WhoIsActive to a table every 30 seconds, every minute. You could do it all day long, like I’ve done in the past, but if you just want to monitor for this specific issue, log that and then look for thread pool waits. You should see, in the info column, I believe it is. Look for blocking, but it might not be blocking. It might just be running out of worker threads because you only have two CPUs.

Brent Ozar: I should also point out that if you search on Bing for sp_WhoIsActive log to table, you get some really pretty adds on the side there about side tables made of logs. Tara has a blog post on logging activity with sp_WhoIsActive there that’s really good.

 

I need to sync data from a read-only database…

Brent Ozar: Oh, let’s see. Next up, Steven says, “I only have read access to a database and I need to take the data out of that and sync it to another database that I have full control over. What way would you use to get that done?”

Tara Kizer: You’re very limited. SSIS probably – I mean, you only have read access, you can’t add triggers, you can’t do anything. So you’re going to need some kind of crosses.

Pinal Dave: Right, or just backup and restore and remove the read restriction. Just thinking loudly, even if that is possible.

Brent Ozar: And Richie used to do a lot of this kind of thing…

Richie Rump: Yeah, when he said sync, that’s the key word for me. It’s like, what does that mean? Does that mean you need to have – hey, I’ve got one source of truth here and I’ve got the other in the other database and I’ve got to merge them together. That is a lot more difficult than just saying, I have an ID here and I just need to see if it exists or not in the other one. So SSIS, I guess, if it’s the easy one. If you need to merge them together, you’re looking at a lot of work there. That’s not simple.

Brent Ozar: At least he said he only has read access to one of them, so it’s probably not that, thank goodness because that is terrible.

Richie Rump: Yeah, I mean, it said sync, you know. When I see sync, I don’t think – I’ve spoken to too many executives, so that’s my problem.

Brent Ozar: Well, and they always end up saying the same thing, “We just need to get this one row back to the other side. It’s just one row. How hard can it be?”

 

How can I reduce blocking between an insert and an update?

Brent Ozar: Naveen asks, “We have two processors running on SQL Server. One’s doing an update, one’s doing an insert. They both kick off at the same time and they’re getting blocked by each other. What should I do to reduce blocking timeouts?”

Tara Kizer: Who is the asker on this one? Read it again.

Brent Ozar: Naveen. And so for behind the scenes stuff for you all, we can all see the same panel of questions but so many of you pick out the questions, the reason why I read your name out, it’s not because I care about you, I don’t care about you at all, but I want the rest of my co-presenters to see which question we’re dealing with.

Richie Rump: But we’d love to mentor you…

Pinal Dave: The update statement is timing out. That means it’s not definite. I mean, it’s just locked on that select or when both of them started a process. One thing, if they’re deadlocking or just locking, and it says after 30 seconds of blocking. So is it an application thing? Maybe there is a timeout. If he just waits a little bit longer, it might be finished. I’m just thinking loudly. Like sometimes, you know, we give very short timeouts and things have to – and I tell everybody that locking is alright, it’s just all about waiting game. Once somebody finishes, the second person is going to get a turn, but deadlocking is bad, so that’s the reason that timing out it immediately. So when I see this one, I might just think that maybe you could change the timing, if possible, if you can do it. If that is not possible, see if one of the processes finishes first so you don’t have to wait for a 30-second timeout to kick in. and if, no matter what you do, if 30-second timeout is still kicking in, well, one of the processes is going to have to be finished one time or another time, then I think another one has to wait. So increase the timeout, and if all of them is not an option, then you just have to look at your company and see your database and see what blockers are there and take from WhoIsActive you can take it, or you can take it sp_Blitz – start seeing all the blockers and start removing one at a time.

Tara Kizer: I would just wonder, because the update statement is the one that’s timing out, so the insert one is the one that’s causing the blocking. Is this a single row insert? Investigate that process. Is it a very large query that’s getting inserted into a table? Investigate the insert and see if that can happen any faster or any indexes that can be added. If it’s just a singleton insert, you know, I doubt that that’s the culprit here.

Brent Ozar: Oh my god, he follows up with, “The insert’s doing 20,000 rows.”

Tara Kizer: Okay, well, I mean, you know, break that up if it can’t complete in 30 seconds. So look at your processes and do things differently. Is it a bulk insert? A bulk insert can be pretty fast. 20,000 rows isn’t any big deal for a bulk insert.

Pinal Dave: What Tara said is very, very true. One of the demonstrations is my favorite demonstrations, for pretty much all my presentations I open with it, where I just show them that how a lot of indexes can slow down your inserts and a lot of people even think it’s linear. So if you have one index, your system is taking three seconds to update. If you create two indexes, a lot of people say it will take six indexes to update. That’s not the case.

When you create one additional index, amount of the administrative tasks SQL Server has to do around that index update is way more than just doubling the time than your original update. So it’s quite possible, a lot of people say this is very generic advice, they say just remove all your indexes and recreate back. That is something your ETL process has to do, but again, that’s just a tradeoff. So you save time now, and when you rebuild all the indexes, you are going to lose the same time. So end result, it’s going to be the same thing, it’s just what makes you happy.

 

How can I capture who does what?

Brent Ozar: Let’s see, Emily says she’s been using a trigger to find out which login is logging in, like what users are logging in, and what database they access. But when they log in, people usually seem to log into master. “Is there a way that I can capture what queries people do in which databases?”

Tara Kizer: What are you trying to solve here? Because you’re going to be adding some performance overhead if you want to figure out what people are doing. I mean, you could possibly set the default database to whatever database they should be in. That way, they don’t go to master first because they’re going to end up in the right database with their queries, since the applications are working. But if you switch the default database to whatever database they should be in, you might be able to have easier success with this. I just wonder what you’re trying to solve here. I mean, you’re obviously auditing and this is going to eventually cause performance issues.

Brent Ozar: It’s hard when people do cross-database queries too, you know. They’ll join across five different databases.

Richie Rump: Because they can.

Brent Ozar: And union all, because they’re third-party vendors.

 

How can I get my data offsite fast?

Brent Ozar: Daryl says, “I want to get my data offsite. Y’all have mentioned backups to the cloud. What options do I have to get my data off to the cloud that would get me protected quickly?” And Tara wrote a white paper about this.

Tara Kizer: I like referring to that whitepaper with clients that don’t have any disaster recovery in place and they’re not willing or not able to currently spend any kind of money on a disaster recover solution. So I’m just like, well, how about just this cheap solution where you just send your backups to a storage bucket somewhere in the cloud, and then you at least have that. I mean, obviously, you’re going to need to send your source code. You can’t just have a database without an application, you know. So at least get that stuff out there. That way, if the primary site ever goes down, you could spin up servers in the cloud and then restore everything after that. It’s not going to be easy. It’s certainly not going to be easy.

From the database perspective, it’s fairly easy. The whitepaper covers all the stuff that needs to happen, but getting your whole environment up and running up there is going to be very, very painful, but at least you’re not toast completely.

Brent Ozar: It’s practically free too. It’s really cheap to get the log shipping going up there.

 

Should I drop my indexed views when changing replication?

Brent Ozar: And then the last one we’ll do, J.D. says, “I’m sorry for another replication question. We have indexed views on our replicated database, but because they require schema binding, we’re using pre and post replication scripts to drop and create the views and indexes. Is this a bad idea?

Tara Kizer: I don’t necessarily know if it’s a bad idea or not. It sounds like it’s working for you, so how can it really be a good idea if it’s working for you? I have not used indexed views in conjunction with replication, but I have had to use post scripts for replication where our source schema and the publisher was going to be different than the subscriber. So in the replication stored procedures, we changed those inserts and updates – it was just the insert and update stored procedures to modify the schema, and that worked fine for us. So if it’s working for you, is it a bad idea? I’ve never heard that it’s a bad idea, but I don’t know how many people are using replication in conjunction with indexed views.

Brent Ozar: Well thanks a lot everybody for hanging out with us this week. Thanks, Pinal for joining us and look forward to seeing people in your training class coming up in June as well.

Pinal Dave: Oh yes, I’m pretty excited. So yes, as we start to discuss, there are a few signups and I’m looking for a few more people to join in with us because I can promise it’s going to be fun and a lot of interesting demonstrations. And most important thing, what I’m looking at is the module three where cheating is allowed. I want to see how people cheat with each other and come up with the wrong answer.

So that’s going to be fun. If you come up with the right answer, bravo, but most probably, I’m going to be in there, so the person who gets the most number of the wrong answer, you are going to be the winner that day.

Brent Ozar: Nice. We’ll see everybody next week at Office Hours. Adios, everybody.


How To Break SQL Server’s XML Data Collection

Humor
0

Oh, XML

When we first met, it was about 2011. I had to load files full of you into a table. It wasn’t so bad.

Seriously.

You were well-formed, and didn’t change from file to file. Even using SQL Server 2005, I could take you.

Later on, we got really close when I awkwardly started working on sp_BlitzCache, like a divorce-dad trying to figure out what you like so our weekend visits wouldn’t be so painful.

We learned to get along. Heck, we had some good times.

But the plan cache isn’t the only place that you get used. No, Microsoft uses you in all sorts of crazy, mixed-up places.

  1. Extended Events
  2. Deadlocks
  3. Service Broker (I know, I know…)
  4. SSIS… things. Probably.

There’s likely a bunch more, but hey. There’s only so much sunshine.

It’s Time We Talk

No, I’m not leaving you for JSON — JSON isn’t looking for anything serious. Heck, we’re not even sure if JSON will get deprecated for whatever is popular on Hacker News in a couple weeks.

It’s just that, when it comes to storing information about deadlocks, you haven’t been handling yourself so well.

I think you’re on Predefined Entities. I think you have a problem.

You see, if someone creates a table that has a funny character in it — “, &, >, or < — you don’t sanitize all your inputs. Then you throw an error when we try to parse you.

The worst part is that you get it right sometimes.

Call me Sometimes.

But other times… Other times!

Lordy Lordy, Lordy

Even execution plan XML gets this right.

smh

We Got Some Breaking Up to Do

If you don’t change your ways, (hint — that’s a Feedback Item, you should go vote for it if you want me to like you), you’re just going to be another known limitation in a long line of scripts.

Thanks for reading!


Why sp_BlitzLock Can’t Show Complete Columnstore Deadlock Information

Deadlocks
6 Comments

People seem to care about deadlocks

That’s why I wrote sp_BlitzLock. Why not sp_BlitzDeadlock?

Well, I had this song stuck in my head. But enough about that.

While poking around with things recently, I created a pretty typical deadlock on a table, but this one had a clustered column store index on it.

Of course, that makes things different.

objectivity

So we go digging

The deadlocks I generated were on a single object. Just a couple inserts and deletes in two separate transactions. Nothing special.

This is the information that I pull out with sp_BlitzLock

Hey Mister DJ

But what’s missing from the column store deadlock information is really weird. Namely, everything.

Oooookay

One may be tempted to try to use the associated object id to get the object name, and one would be left wanting.

I wish you would.

It’s a bug in Microsoft SQL Server.

I am currently unable to give you as much information about column store deadlocks as I’d like to.

Unless Microsoft responds to my Connect Feedback Item.

Brent says: this is such a great example of why niche features run into interoperability issues. They work just fine by themselves, but they’re not tested in combination with other things – even when those things might be mainstream, like deadlock graphs. I bet this bug gets fixed quickly though – with columnstore indexes gaining popularity, more folks will be running into this issue fast.

Update June 2021: Microsoft still hasn’t responded to the bug report.


Building SQL ConstantCare®: 10% of you have Priority Boost on.

SQL ConstantCare
2 Comments

One of my favorite things about my job is being able to take the pulse of how real companies are managing their databases. I don’t wanna be locked in an ivory tower, preaching down to you, dear reader, about how you should be doing five million things a day – when in reality, you’re struggling to get an hour of real work done because you’ve got so many meetings.

But sometimes I wonder – am I out of touch? When I was a DBA, I remember struggling with backups and corruption checking – has that gotten easier? Have today’s DBAs started using more automation and tools to protect their estate? Is Transparent Data Encryption catching on? Did DBAs start using Extended Events for monitoring when I wasn’t looking?

And it’s important because I wanna build the right training material and scripts for our customers. I see a problem trending, I want to be able to give people the right information to fix the problem, fast.

When we launched SQL ConstantCare®, I was excited to see what the data would reveal. 86 users have opted into public data sharing for 285 servers hosting 11,521 databases.

Here’s some of the interesting things we’ve learned so far.

SQL Server adoption is still slow.

It’s spring 2018, but SQL Server 2017 still has less adoption than SQL Server 2008, let alone 2008R2. I’m really curious to see how this progresses as we move towards the end of support for 2008 and 2008R2 next year.

SQL Servers by version

In terms of edition, I went in with no expectations – I really have no idea what our readership looks like, and it’s interesting to see numbers:

SQL Servers by edition

Over 50% of us had basic backup issues.

58% of all us had at least one database (37% of databases overall) that hadn’t had a full backup in the last week. Now this is a little tricky: in the initial round of collection, I noticed a trend that a lot of people would add a development server first, then look at the email advice to decide whether they wanted to add more servers.

However, I also noticed a trend amongst the replies – paraphrasing:

“You know, I’d forgotten about those databases. We restored that a while ago to get some data out of it, and then I guess I forgot to delete it. I’ll go delete those now.”

Similarly, 38% of all servers had databases in full recovery model, but weren’t doing transaction log backups on them.

Between dropping databases that shouldn’t be around (shout out to the multiple folks that had AdventureWorks in production), plus suddenly clearing away unnecessarily giant log files, I can see how the database size tends to drop quickly on servers within the first few days of setting up SQL ConstantCare. (We’ll do an ROI study on that over time.)

We have a mixed record on corruption checking.

93% of all databases had a CHECKDB in the last 2 weeks! That’s awesome!

However, things were a lot worse when it came to enabling checksums for page verification. 5% of databases didn’t have checksums turned on, which sounds small, but it was spread across 54% of the users. Look at the person sitting next to you: either this issue affects you, or it affects them. One of you needs to buckle up.

We’re still not patching.

  • 16% of customers are running a completely unsupported build of 2008 or newer (meaning they haven’t applied a service pack in a few years)
  • 10% of customers are running builds with known corruption or security escalation bugs
  • Hardly anyone is patched for Meltdown/Spectre

As a teacher and consultant, I gotta think hard about that. I don’t have easy answers. It’s not like I can just build a presentation and magically get the business to agree to outage windows.

Long term, I’m thinking of it as a data problem: can I tie peoples’ server problems to a specific CU that has a fix for their issues? That won’t be on the horizon for the product in 2018, but it’s an interesting long term challenge.

37% of us are experiencing poison waits.

When RESOURCE_SEMAPHORE, RESOURCE_SEMAPHORE_QUERY_COMPILE, and THREADPOOL strike, it can feel like your SQL Server service is completely locked up – even though you can remote desktop into the base OS and it responds just fine.

When I talk to training classes about that, I’ve been saying that most of you will be able to go your entire career without having to troubleshoot those issues. Turns out I’m completely wrong, and I need to start talking about these more often, like blogging about how to recognize the symptoms even if you’re not monitoring wait stats.

On a related note, 16% of us have had memory dumps recently. My old advice was to install the SSMS memory dump upload utility, but Microsoft shut that down – likely due to security issues around GDPR, since memory dumps can include PII. Before I write new advice there, I’m going to dig deeper into the data – like if the dumps are correlated to specific builds/versions – to improve my advice.

We still have priority boost turned on.

Thanks, SSMS

It’s hard for me to believe that Microsoft still exposes this as an option in SSMS 17.6. Users should be protected from themselves and from really bad Internet advice – this should be deprecated, burned, and pushed to the bottom of the ocean.

Here’s the real kicker, though: ten percent of us have a server with Priority Boost on.

Another way to think of it: when you’re in a user group session with 40 other people, 4 of them have Priority Boost on. Or maybe you, and 3 other people.

The exciting thing is that we can track what happens to wait stats as people turn that feature off, and then prove if it made things better or worse. (Over 20% of us have either auto-close or auto-shrink enabled on databases, too, but thankfully it’s confined to about 11% of our servers.)

I’m barely scratching the surface here of what we’re learning. As the data grows, I’m really looking forward to showing users comparison data of how they rank related to other shops, how their database health stacks up, and the easy stuff they can do to get better.

Read more of my SQL ConstantCare posts, or sign up now.


The Many Mysteries of Merge Joins

Not A Single Picture Of A Zipper

Image humbly borrowed from https://70srichard.wordpress.com/2014/12/17/beverly-hills-cop/

There are some interesting things about Merge Joins, and Merge Join plans that I figured I’d blog about.

Merge joins have at least one interesting attribute, and may add some weird stuff to your query plans.

It’s not that I think they’re bad, but they can be tricky.

Lots of people see a Merge Join and are somewhere between grateful (that it’s not a Hash Join) and curious (as to why it’s not Nested Loops).

Oh, you exotic Merge Join.

E pluribus unum

In a “good” Merge Join, the join operator in the query plan will have the Many to Many: False attribute.

The optimizer knows this because the Primary Key on Id (though a unique index or constraint offers similar assurances) is distinct for each value in the Users table.

Having one unique input give you a one to many Merge Join.

Simple as a pimple

The statistics TIME and IO profile for this query is about like so:

Not too shabby for one meeeeeeeeeeeeeeeeeeeellion rows.

E pluribus pluribus

In a “bad” Merge Join, that attribute will be True.

Hamburger Lady

Why does this happen, and why is it bad?

It happens most commonly when there are, or may be duplicates on both sides of the results. They can also happen when the outer input has duplicates, but quite often the optimizer will rewrite the JOIN order to put it on the inside to avoid having to use a worktable, etc. Thanks to Adam and Kalen for nudging me in the comments to clarify this part.

Internally, the Merge Join will spin up a work table (similar to how a Hash Join operates), and work out the duplicate situation.

The stats TIME and IO profile of this plan looks like so:

If we were to, say, choose that serial Merge Join plan in the compilation of a stored procedure that became the victim of parameter sniffing, we could run into trouble.

Yes, that is seven minutes and forty seconds. A little over half of one metric cigarette break.

Head to head, the large merge (many to many) is costed higher than the smaller merge (one to many). But you won’t see that in a parameter sniffing situation.

You’ll only see the lower costed Merge Join version of the plan.

TELL’EM LARGE MERGE SENT YA

The optimizer will sometimes to try protect itself from such hi jinks.

Aggregation Ruling The Nation

In some cases, the optimizer may inject an aggregation into one side of the join ahead of time to distinctify the data. It doesn’t need to do both — we only need one distinct input for the many to many attribute to be false.

I haven’t seen a situation where both inputs get aggregated merely to support the Merge Join, but it might happen if you ask for other aggregations on the join column.

It can use a Stream Aggregate, which would generally make more sense, since both the Stream Aggregate and the Merge join require sorted data.

Sense and Sensibility

Under less reasonable circumstances, you may get a Hash Match Aggregate. This plan has the additional misfortune of needing to re-order the data for the Merge Join. Teehee.

Hash Gang

If you see this, something has truly gone wrong in your life. This query was heavily under the hint-fluence.

A Sort Is A Sort

Much more common is seeing a Sort injected into a plan to support one or more downstream operators that require sorted input (Merge Joins, Stream Aggregates, Windowing Functions).

For instance, a query like this:

May give you a plan like this:

Sort early, Sort often

In this case, the Sort happens early on to support the Window Function. It also aids the Stream Aggregate, but whatever. Once data is sorted, the optimizer tends to not un-sort it.

The Sort in the next example will be needed with no index on, or where the join column is not the leading column in the index (there’s an If here, which we’ll get to).

If this is my index on the Votes table, data is sorted by PostId, and then UserId

When my query is just a simple join, like this:

My query plan will look like this, with a big honkin’ Sort in it:

Is kill

On the other hand, if my query looks like this:

My WHERE clause filters the leading column to a single PostId (one Post can be voted on by many Users), the UserId column will already be sorted for that single PostId value.

We won’t need to physically sort data coming out of that.

Like mustard

Out In The Street, They Call It Merge Join

I hope you learned some stuff that you can use when troubleshooting, or trying to understand a Merge Join plan.

This is one of many places that the optimizer may inject a Sort into a plan that you didn’t ask for.

Thanks for reading!

ZIPPER FREE!


Building SQL ConstantCare®: The Minimum Viable Product (MVP)

SQL ConstantCare
2 Comments

When you wanna build something, how early do you let the public in? Where do you draw the lines with the features you absolutely have to have – versus the stuff you just want? Code is never really done.

The lines are so blurry these days with labels like Alpha, Beta, Private Preview, Public Preview, Early Access, Limited Availability, Regional Availability, General Availability, etc. Companies wanna get something in your hands as quickly as possible so they can start learning from what you like and what you use. What’s the first thing you actually ship?

MVP: Minimum Viable Product

The Lean Startup by Eric Ries

In Eric Ries’ excellent book The Lean Startup, he talks about how your company should rapidly iterate along:

  1. Release something as quickly as possible
  2. Measure how customers use it
  3. Learn lessons from those measurements, and then
  4. Go back to step 1

The first time you hit step #1, that’s called the Minimum Viable Product (MVP). Your MVP doesn’t even have to be an app or online service – it could be a manual process, or it could even be just a signup form for a service that doesn’t exist yet. In a sense, you could think of sp_Blitz as the MVP: a script that people could run and get advice about their server.

If you were going to build a system to give people advice about their servers, here’s a few ways you could do it:

  1. Totally Manual Process: you build a list of queries, they copy/paste the queries into SSMS, copy/paste the results into Excel, email them to you, you analyze the data manually, and you manually email them a reply.
  2. Automatic Collection, Manual Analysis: you build an application that runs a bunch of queries and dumps the data into files (say Excel). The user emails you the files, and then you analyze them manually, and tell them what to do.
  3. Automatic Collection, Automatic Analysis: you build an app that runs queries, sends the data to you, and then robots analyze the data and send recommendations to the end user.

Obviously, #3 is a hell of a lot harder than #1, and it ain’t minimum.

You’re probably looking at what SQL ConstantCare® does and thinking, “Wait, he screwed up – he jumped straight to #3.” Well, we’d already done #1 early in our consulting practice, and #2 was our SQL Critical Care® consulting service. (Richie had built a data collection app to make our process faster.)

There’s a lot of gray area in #3, though. We made a lot of brutal decisions about what we would do in order to get the MVP out the door.

What we skipped

We wanted to collect data, put it into a database, and send you emails with advice – but everything about the process was up for debate. If we could put something in your hands faster by making some tough decisions, then we made ’em.

  • “Mute” links – I wanna make it as easy as possible for you to permanently mute servers, databases, or specific alerts that you don’t care about. Soon, the emails will have mute links, but for now, we’re having folks just hit reply and tell us what they want muted, and we mute ’em on the back end.
  • Self-updating app – really wanted this for v1 because I figured we’d be iterating fast over the collector & queries, but no go. The components are there, but you have to run ConstantCare.exe yourself manually if you want to get the update. Has to do with permissions gotchas with Squirrel, the updating tool we used.
  • Windows Service (rather than scheduled task) – an always-running service would let this thing work better in a data center environment, but I’m also hesitant to deal with the support gotchas involved with an always-on self-updating service.
  • Team hierarchy – later, I’d love to designate different people for different groups of servers, or different levels of alerts (like production DBA vs developers.) For now, if you want different teams to get emails for different servers, you’ll need to install different instances of the collector.
  • Interactive web site with your data & recommendations – didn’t wanna hassle with logins for now.
  • Shareable anonymized reports – I’d love to give you the ability to pass your DMV data on to your software vendor or consultant and say, “Here’s what we’re up against – you take a look and tell me what you think.”
  • AG/DR-aware checks – if you have an AG, I’d like to be able to identify backup history across all the nodes and tell you if you have gaps in coverage. (As long as we’re getting backup data across all replicas, this should be doable – just takes more query work.)
  • Troubleshooting tools – sure, it’d be nice to have a dashboard showing where all the incoming files are at in processing, but it doesn’t make sense to build something like that when our processes are changing so fast during the MVP.
  • Automated wait stats analysis – for example, “Your server is waiting on storage, but it’s not because the storage is bad – you just need to change max memory because it’s set incorrectly. Change it to ___, here’s how, and here’s how safe (or unsafe) it is.” Right now, I’m doing this manually for customers, building a manual process of what I’m thinking as I do it.

What we shipped

Users install a desktop app that sends us diagnostic data once a day. Here’s a video showing how you install it – warning, it ain’t pretty:

SQL ConstantCare Installation

There’s no GUI, there’s no wizard to add servers for you, and there’s no input validation if you screw up your email address or server names. There’s a lot of places where things can go wrong. I’d love to have an entirely graphical setup process that sweeps the network, suggests the SQL Servers that you would want to monitor, and guesses the right time zones for each server.

I would also like a pony. (Not really. Ponies smell bad. Except this pony.)

But you get what I mean – you go to war market with the army installer you have, not the army installer you want. Besides, this is a one-time experience for users – the more important experience was getting the emails that were valuable and actionable.

The emails – that’s where the value comes in. Right now, Lambda functions analyze the data and send us emails that look like this:

Erik needs to do a better job of backups

They’re the same emails that you’ll eventually get directly – but for now, we’re watching them ourselves, making sure the functions are returning the right recommendations, and then manually sending guided recommendations directly to the customers.

Then for some customers with special situations, or where the automated emails aren’t quite the right fit, we send manual emails like this (sanitized customer email sent to myself):

Sent by Mañuel Llabor

Is that a lot of work? Sure it is – early adopters are basically getting a screamingly cheap deal on our personalized attention via email. If you’re going to automate something big, though, you’ve gotta start by making sure you can do it manually first.

How the MVP is scaling so far

The line is the number of servers we’re analyzing (from customers willing to share their numbers – more on that in the upcoming GDPR post), and the bars are the terabytes of data on those servers. The jump yesterday was the first day of the marketing launch:

Servers and terabytes of data over time

Yes, the terabytes of data have gone down more than once as folks have suddenly realized, “Whoa, someone restored a bunch of databases onto a production server, and we forgot to get rid of them. They’ve just been making our maintenance windows take longer every night.” My favorites are when we’ve found AdventureWorks on production servers.

Those server numbers may not look big – we could have easily processed 156 servers’ worth of diagnostic data with a conventional Windows app running in a VM – but check out these hosting costs:

Amazon Billing Explorer

We spent ~$650 last month on hosting, and is projected to come in around $1k this month. That includes development environments, by the way. This is inexpensive enough that I could afford to absorb it for even just a handful of customers if it hadn’t caught on.

As more customers start to diagnose more servers, serverless really pays off. When new customers join, they seem to wait a day or two to set up collection, then they set up collection for just 1-2 servers, see the results, and then suddenly go, “Whoa, I should add a few other servers in too.” We added 49 new customers yesterday, so I wouldn’t be surprised if we were monitoring a total of around 300-500 servers by the end of the week.

In preparation for that, Richie’s been putting a lot of work into tuning data ingestion and processing the rules quickly, staying ahead of the Lambda function timeouts and the growing data set.

As we go, we’re learning stuff, tweaking the system, and figuring out what things users can fix on their own and which ones they need more help with. It’s even driving the blog posts I write – for example, when I see a problem on several servers in a row, and I have the same discussion with customers about it a few times, that means I need to write a post to link folks to. The results are posts like “But I don’t need to back up that database” and Why Multiple Plans for One Query are Bad.

What’s on the roadmap next

Now that we’ve got the MVP out, we’ve been working on:

  • Quality checking on the automated emails – we’re pretty close to the point where we’ll start letting the automated emails go straight out, but then once users have fixed the first round of problems (like no backups, obvious server misconfigurations, etc) then the human intervention will kick in later. Right now, with ~50 active paying customers and ~150 SQL Servers, it’s easy enough that I can still keep an eye on this manually, but we’ll be switching over soon.
  • Back-patting rules – tying together a recommendation we made, a customer’s successful application of that recommendation, and the difference it made in health or performance metrics.
  • Not breaking the build – seriously, I have a nearly 100% failure rate on my commits. Richie has to be gritting his teeth by now every time he sees one of my pull requests.
  • Wait stats trending – right now I’m manually trending wait stats, then emailing customers an analysis. We’ll need something more scalable as we go to 1,000 servers and beyond. When you build an MVP, though, you gotta do things that don’t scale.

Read more of my SQL ConstantCare posts, or sign up now.


Building SQL ConstantCare®: What Data We Collect (and What We Avoid)

SQL ConstantCare
2 Comments

Next up in my series of behind-the-scenes posts about our new SQL ConstantCare®, let’s talk about the kinds of SQL Server diagnostic data we’re gathering, why, and what data we’re steering clear of for now. (Psst – read through to the end for a huge, crazy discount today, too.)

We – you and I – are paranoid.

We’re a data professionals, first and foremost. During my consulting intro calls, I’m really proud to say that we don’t need to get a VPN connection, remote access, or a SQL Server account. You know how it is – DBA, Default Blame Acceptor – and these servers are already in bad shape. The last thing folks want is some consultant going rogue and “fixing” something, so we make it clear from the get-go that won’t happen.

Because we’re so paranoid, my first thought when deciding to collect diagnostic data was, “I don’t wanna end up in the news.” I bet you feel the same way. Therefore, we thought about how we could give users as much valuable insight as possible from as little data as possible.

I drew a big, clear box around queries, query plans, and statistics and said, “We’re not going here in v1.” That data can produce awesome performance recommendations, but it often contains personally identifiable information. I know some people are willing to share that data on a plan-by-plan basis – the referral-marketing success of PasteThePlan proves it. I even bet a lot of people are even willing to share a lot of their plans on a regular basis as long as they’re getting valuable performance advice. However, I think we can still provide a lot of value without having that data, and we have plenty of Lean Startup lessons to learn along the way first. For now, rather than automating that part of the analysis, we simply tell customers in their advice, “If you’re able to tune queries or indexes to fix this, here’s the exact parameters to use with sp_BlitzCache and sp_BlitzIndex to find the root cause of this problem.”

We started with the only data to run the most urgent sp_Blitz checks.

To pick the data we’d gather at first, Richie took the list of sp_Blitz checks from priority 1 to 100 – the most urgent stuff – and came up with the list of data requirements to be able to send emails like this:

You’d be surprised how many real SQL ConstantCare emails start like this.

To see the queries we run to get that data, look at the queries ConstantCare.exe runs by going to Windows Explorer and opening the folder:

One file per query

The way it works is that ConstantCare.exe connects to the SQL Server, runs these queries, writes the results to an encrypted zip file, and uploads it to a private bucket in Amazon S3.

At this point, you might be thinking, “Why not just run sp_Blitz on the server and harvest the results?” We wanted to design checks that worked with data over time – for example, after we find auto-shrink enabled, and after you turn it back off, how are your wait stats affected? (And yes, we’ve already found over a dozen servers with auto-shrink enabled, hahaha.)

Exposing the queries directly to customers also has a side benefit: later this year, they’ll be able to save override versions of the queries. This way, if there are some parts of the diagnostic data that you don’t wanna send in, you can override that portion of the query (returning nulls, or masked data, or just not return any rows at all.) This presents a little more design and support work for us since missing rows might trigger a false alarm – for example, if no backup data comes in, we don’t wanna say, “OMG, you aren’t backing up your databases!”

Serverless architecture means per-database checks are harder.

We hit a few challenges early:

  • In the past, we’ve used sp_MSforeachdb, but that doesn’t work in Azure SQL DB (and we wanted eventual compatibility with that, although not necessarily in v1)
  • People have a crazy number of databases per server (think thousands)
  • People who have a crazy number of databases also seem to have a crazy number of indexes per database (think thousands per database, times thousands of servers)
  • And they also have a crazy amount of backup job executions (especially if they took my advice about backup frequency)

Moving large files around in the cloud isn’t a big deal, but processing them is, especially with serverless. AWS Lambda functions have a 5-minute timeout, and we use Lambda functions for stuff like:

  • Importing the client’s JSON data into the database – in the first version of the intake process, we were processing incoming files line-by-line – and hoowee, did that not work with the crazy-number scenarios. After hitting timeout walls, Richie later changed it to pull the whole JSON file into the database as a set, plus we held the index analysis back for a later version.
  • Analyzing the data to build email content – just like your homegrown queries look at SQL Server’s DMVs to figure out what’s broken, we’re using Lambda functions to run database queries to look at your DMV data. It’s just that the data is now stored in a central database on our end. Imagine a database server that holds DMV data from lots of customers – we’re gonna run into performance issues as we gain clients, and retain more database history. If your homegrown query takes over 5 minutes to run, you’re the only one who cares – but if our AWS Lambda functions time out over 5 minutes, you won’t get your report, and I care about that.
Our SQL ConstantCare Slack channel before performance tuning

Watching function runtimes and timeouts has probably been one of the more interesting challenges so far. We have a company Slack channel for SQL ConstantCare® where status updates go and we talk about what we’re doing about ’em. We laugh a lot in there. Also, swearing and gifs.

The whole mission and design of SQL ConstantCare® means these timeout errors are less of a concern – it’s a mentoring product, not a real time monitoring product, and users don’t expect to get an email instantly after their data is sent in. (Plus, data is sent in daily, but for now, analysis & mentoring emails only go out weekly.) The database size is less of a concern, too – during the initial Minimum Viable Product (MVP) phase of the project, we could even delete and redeploy the database from scratch whenever we want, and the product still worked fine.

Read more of my SQL ConstantCare posts, or sign up now.


[Video] Office Hours 2018/3/28 (With Transcriptions)

This week, Erik and Richie discuss what to do first at a new job that never had a DBA before, basic vs advance DBA skills, tempdb files, downgrading SQL server versions, a database architecture issue, tools for monitoring SQL Server instances, other areas to focus on as a DBA besides HADR and query tuning, SQL Constant Care®, and database backups.

Here’s the video on YouTube:

Office Hours Webcast - 2018/3/28

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 3-28-18

 

What should you do on your first day on the job?

Erik Darling: Let’s see, I’m going to try this name, but I’m not going to do well at it, Seybou – I like that you have B-O-U in your name; that’s Brent Ozar Unlimited – asks, “What should you do first when you start a new job at a company that never had a DBA before?” My man, backups, backups, backups, backups, backups, backups – make sure that the backups are in order before you do a damn other thing. If you don’t have those, like, it doesn’t matter if you’re checking for CHECKDB, checking for corruption, it doesn’t matter what else you’re doing in your life, if you don’t have those backups, you are missing the biggest piece of the DBA puzzle. What do you think, Richie?

Richie Rump: Oh no, that’s exactly right. So in fact, we don’t even do consulting gigs unless we have backups from the client. And we actually go in and check and make sure that they have valid backups for everything. And there’s been times where we’ve actually cancelled gigs because they haven’t had valid backups. So we’re practicing what we preach; that’s all I’m saying.

Erik Darling: Not even like, you know, the situation where it’s like, they’ve never taken a backup. It’s like, if their backups are like more than a week or ten days old, we’re still like, you have to take one like now because we don’t know what was in that – we don’t know what was going on the last time you did anything. I feel like a nerd now, everyone can see my EpiPen. I went to get my allergy shot yesterday and I have to bring this thing with me to get my allergy shot, or else they won’t give it to me because I might, I guess, die in the office.

Richie Rump: That kind of reminds me of the old bear skit form SNL, you know, that’s my fourth heart attack this week.

Erik Darling: That was a good one. That was, like, one of the last funny things SNL ever did.

Richie Rump: No, no, no they’ve been doing some pretty good stuff recently. I haven’t been watching it live but I’ve been watching it on digital and they – Bill Hayder was on this week and he broke like five times; it was great. He just kept breaking. He was like, “I’m not a cast member anymore. I don’t have to worry about my job. I’m going to go ahead and break.” Cracking up in the middle of his stuff.

Erik Darling: I don’t know, unless he’s trying to come back on SNL. But to get back to Seybou’s question a little bit, so after backups, of course, CHECKDB and making sure you have corruption checks running, after that, everything else is kind of less important as business priorities. So make sure you you’re meeting RPO and RTO goals, then after that, make sure you’re meeting whatever SLAs you have with customers, and then after that you can start focusing on stuff like performance, whether it’s query or index tuning.

Please, feel free – I mean, it’s free, give sp_Blitz a run. Head over to our first responder kit, go grab a recent copy of sp_Blitz, give it a run, see if there’s any surprises. Best way to do things, I think; it’s neatly packaged up for you.

Richie Rump: We have this new product that we’ve just been slowly rolling out called ConstantCare. And ConstantCare will actually go in and do all of that stuff for you. It will go in – send us the data every day and we’ll send you an email. Right now we’re once a week; eventually, it will be once a day and it will actually tell you, “Hey, this one backup didn’t run,” or it will tell you go do this or go do that. So that is an option and you don’t need to worry about running scripts or anything like that, you do one install and it does it all for you. So obviously, it’s something near and dear to my heart, because you’re my life, but yes, since we’re not getting the word out, I can now talk about it.

Erik Darling: Finally, like a think that you’ve been working on almost since you first started here and everyone’s just like, what the hell is Richie doing? Well, now we know.

Richie Rump: What is he doing?

Erik Darling: Just Paste The Plan, really? That’s it?

Richie Rump: I know, and he hasn’t done anything in like a year, what’s up with that?

Erik Darling: And it barely works – just kidding, it works fine.

 

What separates a junior vs senior DBA?

Erik Darling: Anyway, let’s see, “I’m currently a BI developer but I’m interested in becoming a DBA as a future goal. What aspects of DBA do you consider basic versus advanced, i.e. DBA versus senior DBA? For example, would you put replication HADR in the basic or advanced group?” Well, there’s a basic and advanced level of knowledge with anything, so what I would do is forget about replication immediately. Don’t pursue that. it’s really not the future – it’s not an HADR feature, for one and it’s not really the direction that anyone’s going to be heading in if they need HADR. As SQL Server progresses and Availability Groups, and now basic Availability Groups on Standard Edition get, you know, more robust and less brittle, that is going to be the pretty obvious, you know, the HADR solution for just about everyone moving forward.

But like I said, there’s a basic and advanced level of knowledge with anything, you know. Knowing how to create an index is a whole lot different from knowing how to tune an index. The same thing goes for query writing, you know, and knowing how to write a query is a lot different form knowing how to tune a query. So it really depends where you see yourself as a DBA.

The days of the Swiss Army Knife DBA are slowly drying up. If you want to really be senior at anything, you have to specialize, like a son of a gun, and you have to get really good at something. There are just simply too many products, too many features and too many quirks and too many other things for someone to really have a senior level of knowledge across the board. I’m sure there are some people who can get that, but they’re people who have had those building blocks for years now. Like if you take someone like Bob Ward, who will be able to troubleshoot an Availability Group, tune a query, figure out indexes, you know, run a debugger, do stuff in extended events. There’s obviously the renaissance men out there who can do all that, but one of my absolute heroes in the world, Paul White, probably knows little about Availability Groups but knows the optimizer intimate.

So if you want to be a senior DBA, you have to specialize, you have to figure out what you’re passionate about as a DBA. If it’s the HADR stuff then you have your direction there. If it’s the query and index tuning then that’s your direction for that.

Richie Rump: Yeah, and now, since this is a future goal and you’re looking into becoming a DBA, then you still need to understand the concepts so you could actually talk intelligently about them. So if you’re going for a job and you’re not focusing on replication but they’re doing replication, you at least want to have a conversation about that and understand why replication isn’t the future and why this other technology is. So, you know, don’t avoid it but don’t go in depth if it doesn’t interest you.

Erik Darling: Obviously, you know, DBAs need a certain level of knowledge about little things, even if they’re not going to specialize in it. Like, if you’re going to call yourself a DBA, you should at least know how to take a backup and do a restore without Googling too much. But, you know, if that’s not your primary goal as a DBA – if your goal is not the backup and restore infrastructure person then obviously you don’t need to know every single in and out about buffer counts and max transfer sizes and all that other stuff. So really pick the route that you want to take as a DBA, or, you know, it seems like a pretty solid decision as a BI developer now. Take your time in figuring out what you really like about the DBA role and then pursue that in a seniorly way, I guess. I don’t know how else to say that.

Richie Rump: No, I think it’s good.

 

Do I need more than one TempDB data file?

Erik Darling: Good, I got passing grades from Richie so I’m ready to ask the next one. Steve Malcolm, who we’ve been talking to via email, is still confused about tempdb files. Yes, Steve, you need your NDF files, those are secondary data files. Since this is tempdb, you’re going to want to have more than one data file.

Steve asks – Steve sends through an email yesterday to us asking why he had multiple tempdb log files, but it turned out – we’ll get to it in a second, but it turned out that they were both data files. Now, with 2016 and 2017 SQL Server setups, something that’s changed from prior versions of SQL Server, which used to just give you the one tempdb file, is during the setup process there’s a screen that comes up that looks at how many CPUs you have in your system and it gives you one tempdb file per core up to, I think, like eight, then it stops there.

Then the bottom line of that is, like, the link I sent you over about configuring tempdb yesterday, configuring multiple data files on tempdb is generally a wise thing to do. SQL Server can use those files in a parallel manner, it can write to a whole bunch of them at once, which can reduce contention and all sorts of other stuff if you have a busy tempdb. The nice thing about 2016 and 2017 is they made the behavior of trace flags 1117 and 1118 the standard; that’s the default behavior now so you don’t have to turn those on anymore. No, you don’t. Alright, let me catch my breath for a second here.

Richie Rump: I don’t know whatever you just said, it sounded good to me, but, you know, again, I’m not [crosstalk] that that guy is… But then again, I couldn’t get, you know, you to deploy a serverless application either, so.

Erik Darling: No, no and then, you know, that’s the wonderful thing is that Brent can find people who can do the little bibs and bobs of exactly what he needs. He has you if he needs to do some crazy serverless development task that I would have no hope of ever getting past, like, figuring out what to do for step one on. He has me and Tara for the consulting and the DBA type stuff. He’s got things pretty solid; at least I hope he does, anyway. If he doesn’t, I don’t know…

Richie Rump: What do we need Brent for? I think a lot of things, actually.

Erik Darling: He’s like dad; we bring him along because he buys stuff, right.

Richie Rump: Yeah, yeah that’s right, yeah.

Erik Darling: Pays for the movie tickets – he’s a nice guy. He’s fun – he’s fun.

 

I need to downgrade Enterprise to Standard…

Erik Darling: Let’s see – oh boy, Doug says, “I have a situation…” Not our old Doug. “I have a situation where I need to downgrade SQL 2016 Enterprise to Standard. If I maintain copies of master, model and MSDB and after uninstalling…” Oh boy, jeez. You know, I’ve never had to go through that process. I would highly, strongly, firmly suggest that you try that somewhere because I don’t even know that you would be able to use those copies.

Richie Rump: But it’s the same version, it’s just different editions. I would – yeah, definitely test it out. I’m thinking it probably would work. I don’t know.

Erik Darling: I would want to test that out. I would be concerned about all sorts of things with that. I wouldn’t want to rely on that for a downgrade.

Richie Rump: The first thing that came to mind is that you would just move all your objects over, SSIS or something, you would move the data. But it’s the same version, just different editions, so maybe…

Erik Darling: Maybe, but here’s what I would want to do instead. I would want to side-install. I wouldn’t want to uninstall the first one completely. I would want to side-install Standard edition. And this is actually – a good use for PowerShell is the dbatools.io people have written all sorts of command lets to like copy settings and stuff from one server to another. It’s not quite desired state configuration, but it’s a good – like, you know, if you need to like migrate a bunch of stuff over all at once, it might be a good option. There are a whole bunch of command lets over there that can do that and other cool things.

So head over to dbatools.io, side-install Standard Edition and then script out whatever stuff you want to move over to there. I wouldn’t rely on the uninstall-reinstall and like either have SQL 2016 pick up on the new files or try to like restore stuff over. That just sounds like a nightmare to me. If you do that, you have absolutely no back-out clause. You are stuck. You are hosed with whatever setup you get yourself into. If you do the side install and you just script out the stuff that you need to move over, I think that puts you in a much safer place, even if you have to…

Richie Rump: And if this is a virtual machine, then just spin up a new virtual machine and just do it that way. But wow, good luck because that sounds like a rough one.

Erik Darling: Yeah, that’s not fun at all; the stuff that does not make me want to have a real job ever again.

 

Erik Darling: Alright, Pablo asks an interesting architectural question. “I have a view that gets three billion rows to make queries there.” I don’t know what that means exactly. “Those rows are across 40 plus databases making unions. Is this structure okay? May I have less databases or just one to avoid the unions?” This is generally the kind of thing that one would want to avoid.

Richie Rump: Yeah, we’d consider that not a best practice.

Erik Darling: Yeah, it’s a little bit more than one could diagnose without looking at things, you know, in a 30-minute free webcast, but generally that’s not a design pattern that we would want to put folks into. If you’re doing like the one database per client thing, I’m totally on board with that. I think that’s a much better pattern than the giant database teaming with everybody even if they’re separated by schema or whatever other crazy things. I think that’s a much smarter way to do things.

If you really need to query data across all the databases, perhaps like a database that you pull data from rather than having to execute queries across all of them at once would make more sense. Like, you could have just a data-dump database and you could have SSIS just move stuff in incrementally during the course of whatever day or period you need. Then again, if what you have written isn’t slow or just not causing any problems, don’t change a thing until it causes problems.

Richie Rump: Yeah, when I think about joining three billion rows in a union…

Erik Darling: That might slow things down.

Richie Rump: yeah, I don’t know what you’re trying to do, but trying to do that with a view would give me the caution to say, wait, what are we trying to do, for one, and is there a better way of doing it than in a view? Maybe there’s a different way we could go about this. I’m assuming that because you have 40 different databases, it’s the same database but all for different customers and those are tough architectures, especially when you’re trying to join them all together and do data off of them.

Typically, you want to put them all into one database, but there’s different solutions for different things that you’re trying to do. So three billion rows in all different database and trying to get them synced together and all that, that’s tough as well – so that’s a tough one but yeah, it all depends on what you’re trying to do and what data you’re trying to [inaudible] from and what information.

 

What do you use to monitor SQL Server?

Erik Darling: Cool, alright, Sree asks a question. And I swear to god, this is a question, not a plant so that we can talk about ConstantCare again. It is in the transcript. I’m looking at it right now. My name is not Sree and as far as I know, neither is Richie’s. “What kind of monitoring tools do you use to monitor SQL Server instances. Do you have any thoughts on the Grafana Dashboard and alerting?” Well, no I don’t and it’s not because I think it’s bad, it’s just because I’ve never used it or seen it. I don’t really know.

So Brent’s been posting some great posts, Brent, lately about our pseudo monitoring tool. But like he says, it’s more of a mentoring tool. So you have, you know, a good set of vendors out there; SentryOne, Quest Spotlight, people like that who have a cool monitoring tool with a good dashboard that gives you pretty similar metrics across a whole bunch of different areas in SQL Server. But the direction we went is a little bit different because with any monitoring tool, you need to A – figure out how to use the monitoring tool, figure out how to dig into it deeper and then learn how to solve the problems that are causing the monitoring tool to show you to these things.

The direction we’re going with it is, like Brent calls it, a mentoring tool. So what we’re going to do is we’re going to get all the data that a monitoring tool would, we’re going to analyze it, we’re going to do that work for you and then we’re going to send you an email that tells you which buttons to push and which things to do to try to solve those problems. So it’s a little bit different take on monitoring. There’s no dashboard involved. There is no – you don’t get to, you know, look at uptime, downtime, green lights, red lights, you know, charts and graphs and all that stuff. But you also don’t have to learn how to read all those charts and graphs and do all the work behind the scenes.

Richie Rump: Yeah, some of those metrics always made my eyes just cross. Oh, it’s red, is that bad? Is that really a problem? Having all those wait stats, is that a problem? What does that actually do? And then you have to figure out, okay, well what does it actually do? And then you have to do more investigation – is that actually a problem for our server?

Erik Darling: A lot of monitoring tools will just show you whatever is high. So it’s like, if some completely meaningless metric just happens to spike up, like buffer cache hit ratio or like, I don’t know, what’s another weird one? I don’t know, stuff like that. Stuff just pops up and it’s like, oh my god, freak-out. And you’re like, what does that mean? And you’re like, I don’t know, is it bad? How many page lets do I have? I don’t know, what’s going on? You just get so confused staring at all these different numbers that are just changing all the time and you have no idea if they’re good, bad or ugly.

Richie Rump: Yeah, and you know, maybe you had all these extremely high page splits and, oh my gosh, we have a problem, and yet you did a huge load into the database the day before and now you’re like, oh I’ve got a problem. Oh, but wait, there was a load that went in…

Erik Darling: Fun fact, SQL Server counts new page creations as a page split; ha-ha. No one’s looking into that. You insert a million rows, it’s like, oh no, I have a million page splits, dear god, what am I going to do? Funny…

 

What areas should a DBA focus on?

Erik Darling: Chris asks kind of an interesting question, “What are other areas to focus on as a DBA beside HADR and query tuning?” Well, I guess dev ops might be a thing.

Richie Rump: Starting to.

Erik Darling: I know thankfully little about dev ops so far, so I’ll let Richie expand on that one a little bit.

Richie Rump: Oh thanks, sir, so much. I’ve worked on a couple of dev ops teams. Essentially, it’s the idea of…

Erik Darling: Are you a script-master?

Richie Rump: I am a certified script-master, oddly enough. I have my Agile certification, which means I could read, kind of. It was the dumbest test ever. It was like, the answers were in the questions and it’s like, you think some Microsoft tests are bad, woo, certified script master was even worse. But essentially, it’s the idea of you’re merging your development and your operations together. So as opposed to throwing things over a wall and letting the DBAs handle it, the DBA works very closely with his own team, or is usually on the development team and they kind of work together for these things.

So your dev ops would be, kind of, a merge of a database developer and a DBA kind of into one bundle. They typically would know lots of scripting languages and, kind of, handle things together. They do things like dashboards and monitoring and make sure everybody understands the state of the system and things like that. So yeah, I mean, it’s an option of a way to go. Someone like myself, I could be considered a database developer. I consider myself a data-veloper; you have to pay me if you want to use that. but essentially…

Erik Darling: Copyright it.

Richie Rump: Exactly, so I use [crosstalk]… But essentially, I stand in the middle between the development team and the data team and I make sure all that stuff in the middle kind of goes very fast, whether it’s using ORMs or software or whatever that is. So I kind of have to do both the software and the development side. Not a lot of us guys around, so, I hear it’s a growth industry. You may want to look into that.

Erik Darling: Yeah, software is a big thing apparently; I never would have noticed.

Richie Rump: You know, not a lot of us software developers have opinions. I need you to know that. we’re really easy-going guys. Oh, you like that? Oh, I like Visual Basic, isn’t that great?

Erik Darling: Yeah, let’s high-five and hug and let’s like talk about it civilly over lunch. Let’s not talk about, like, the weird version to version issues that we have.

Richie Rump: yeah, let’s meet on Stack Overflow. I’ve got nothing but great things to say about your question.

Erik Darling: Yeah, it’ll be fun. I love the way you handle nulls. How fast can you serialize that array? I don’t know…

 

Do I need ConstantCare if I have a monitoring app?

Erik Darling: Alright, Daryl asks a sort of follow up question to the monitoring thing. He says, “I already have a monitoring service 24/7. What would be the point of ConstantCare? Would I want to keep my 24/7 service?” It’s not a replacement for a 24/7 service. This is something that collects data once a day, but right now it checks in with you once a week. Like Richie said, it will eventually move to once a day, where we look at your server, we try to see what’s good, bad, and ugly about your server and we send you an email so you know – and like, we teach your – we have like training videos and stuff so, like, we send you an email. We say, hey, this is what we saw on your server, these are the things going on, these are the things we think you should fix. Here’s how to fix it with, like, you know, links to training videos, blah, blah, blah; stuff like that. So it’s not a replacement for a 24/7 service. It’s going to give you, like, up, down, servers on fire type stuff. What it is, is it’s going to take a look at a server over time and figure out if it’s doing better or worse and how you can make it do better.

Richie Rump: Yeah, I like the way he put it as, it’s a mentoring tool. And some of the feedback we’ve gotten from some people, it’s like, I’m a solo DBA and it’s like I have a co-worker sitting next to me, you know, so I can kind of throw ideas off of because you have a certain level of access to us as well through the service. It’s new, it’s different. I would suggest reading the post that Brent is putting on the site every Monday. I think there’s three of them out already. I’ve been way deep in the code, so even when I read some of that stuff, it’s like, oh okay, yeah that’s right, and I’ve been working on it for a year. So if you want to talk more about this service and development and what it takes and all that stuff then I’m your guy. But if you want to know about what it can do for you and things like that, those posts are phenomenal.

Erik Darling: There’s a whole bunch of new ones coming out.

Richie Rump: Every Monday, I think, they’re going to be dropping, right.

Erik Darling: yeah, for a wile anyway.

 

If I need to shrink my database…

Erik Darling: So we’re going to finish up, Tom asks, “Is it okay to do a Full Backup then switch the database to recovery mode simple…” You’re lucky Tara’s not here to hear you say recovery mode, “Run DBCC SHRINKFILE and then switch the recovery model back to full?” bad news, you won’t be in full recover model again until you take a full backup after you switch it back to full, so I don’t know. Don’t shrink your databases, just please stop doing it. It’s a bad idea. It’s not a good idea. Like, since 2009 it’s been a known bad idea. Don’t so it anymore. Please stop.

 

Can I use SETUP to change editions?

Richie Rump: Actually, before we drop, Stephanie had a suggestion saying that, “Setup.EXE has an option to downgrade editions,” which I did not know.

Erik Darling: I’ve never seen that either. Congratulations Stephanie.

Richie Rump: So maybe it’s as easy as just running setup.EXE again and just say downgrade this sucker.

Erik Darling: There you go. Ooh, but that would be interesting if they’re using Enterprise features. So if you were using TDE, you wouldn’t be able to downgrade.

Richie Rump: Yeah, TDE or – well, I guess now even if they’re using partitions, but that’s free now too, right?

Erik Darling: SP1, yeah, 2016 SP1. But there’s a whole bunch of stuff that’s still not, like, available to everyone. Alright, well we should go now; get back to work. Richie has cloud stuff to do. I think I have other things to do too, so…

Richie Rump: I’ve got database tuning.

Erik Darling: Wahoo, you do more of that than I do these days.

Richie Rump: Yeah, it’s all the bad code that I’ve written.

Erik Darling: Ah well, someone’s got to write it. Alright folks, thanks for showing up. We will see you next week; goodbye.


“But I don’t need to back up that database.”

You’ve got this database that you swear doesn’t need to be backed up, like:

  • Read-only databases
  • Schema-only databases with no data in ’em
  • Archive databases that haven’t changed in years
  • Maybe even system databases like master/model/msdb

And you think you’re going to save some time in your maintenance windows by skipping backups and corruption checks on those databases.

Lemme tell you a few stories. Everything’s been abstracted to protect the not-so-innocent.

The Successful Failed Deployment – RetailCo needed a software package to manage their inventory, so they bought one from SoftwareInc. RetailCo set up the server and did the backups, but SoftwareInc’s installer absolutely required SA permissions, so they got it. Over time, one of SoftwareInc’s deployment scripts had a bug: it didn’t change the default database, so they accidentally deployed some of their tables in the master database. The app continued to work perfectly – but real user data was going into master. RetailCo wasn’t backing up the master database, figuring they’d just move the user databases over to another server. You can guess what happened when the server crashed.

The Mostly Read-Only Database – The data warehouse archiving strategy involved one database per year with databases going back over a decade. The older databases were marked read-only, so the DBA figured he didn’t need to back them up often. After a migration to a set of brand-new, way-faster SQL Servers, user complaints started coming in about incorrect query results. Turns out each database had views with tax logic in it, and every time the developers did a release, they’d alter the archive databases to make them writable, change the views, and then alter them back to read-only. No big deal – just now apply the same thing to the archive databases again, right? Well, the project manager was white as a sheet and said, “I’m afraid we need to talk about how we’ve been doing something similar to correct product categories for the accounting team.”

The Truly Read-Only Database – The DBA was absolutely positive no one could write to the archive database because he’d done a great job of locking down permissions. He backed up the monster 10TB archive database once, and then faithfully ran CHECKDB on it as often as he could – not too often, given the size. Years later, the CHECKDB failed – because after all, the SAN can still write to the database regardless, and sometimes, storage gets a mind of its own. The DBA said no problem, I’ll just restore it from backup – and then found out that backups over two years old had been deleted as part of the company’s compliance projects. (Yes, in some private companies, folks actually have an obligation to delete data to stay in compliance.)

The Priceless Schema – A SaaS company came up with a creative version management solution. Customer data lived in one database – call it DataDB – but stored procedures & views lived in another, call it LogicDB. With every release, the app built a brand new LogicDB database with all the new stored procedures, views, etc – like LogicDBv1, LogicDBv2, LogicDBv3, etc. When a SaaS customer migrated to a newer version, the support team just changed the default database for that customer’s SQL login, and presto, it started using the new objects. Seemed brilliant at the time, letting customers live on different version levels while calling the same stored procedure names. The DBA didn’t bother backing up the Logic databases – there had been hundreds built up over time, and they could all be regenerated by source control, right? And the DBA had been using a script to sync logins from production to disaster recovery, but she didn’t think about syncing their default databases, too (and the support team was constantly changing them based on what version of the app they were on.) As a result, when they failed over to DR, it was more D, less R.

Brent’s rule: on a production server,
every database is a production database.

Don’t take the initiative to cut corners around data. Your first obligations are to serve and protect the data. Start there first, not with “time-saving” or “cost-cutting” ideas that don’t really accomplish either.


First Responder Kit Release: Driving Miss Data

Ugh. Bungabase slow.

When Grog try count meat, Glug eat meat.

Then Grog get hungry and Glug try count meat.

Bungalock.

Need big head make bungabase fast.

You big head?

You can download the updated FirstResponderKit.zip here.

sp_Blitz Improvements

#1459: A while back, Brent blogged about implicit transactions. In the comments, a smart commenter named Dan pointed out a way to find that with a couple DMVs. If you’d like that information to be available in more places, please consider voting for my Connect item. This check was also added in different ways to BlitzWho and BlitzFirst.
#1483: Add CPU speed to the power settings check.
#1481: Brent made stuff compatible with Managed Instances. Hopefully that holds up.
#1465: Someone had a problem in German. This is an infrequent occurrence.

sp_BlitzCache Improvements

#1516: Added support for correctly displaying Unicode database names ?_?
#1489: We had a whole heck of a lot of checks running. It’s not that they weren’t useful, it’s just that they created a lot of noise, and may not all have been immediately actionable. All that XML parsing was starting to slow things down. I took a bunch of them and changed the code so they only execute if you run it with @ExpertMode = 1.

sp_BlitzFirst Improvements

#1459: See note in sp_Blitz

sp_BlitzIndex Improvements

#1513: Added support for correctly displaying Unicode database names ?_?
#1509: The column names column in the #statistics table is now NVARCHAR(MAX). This was reported to us in the Slack channel! Thanks to Sanket for letting us know.

sp_BlitzWho Improvements

#1459: See note in sp_Blitz

sp_DatabaseRestore Improvements

Nothing this time around

sp_BlitzBackups Improvements

Nothing this time around

sp_BlitzQueryStore Improvements

Nothing this time around

sp_AllNightLog and sp_AllNightLog_Setup Improvements

Nothing this time around

sp_foreachdb Improvements

Nothing this time around

PowerBI

Nothing this time around

sp_BlitzLock

#1500: Column Store deadlocks generate slightly different XML. We now account for those.

#1469: Fixes an issue where come deadlocks were only counted once. Thanks to @HolisticDeveloper!

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs, heh.
When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

You can download the updated FirstResponderKit.zip here.


Hash Join Memory Grant Factors

Buskets

Much like Sorts, Hash Joins require some amount of memory to operate efficiently  — without spilling, or spilling too much.

And to a similar degree, the number of rows and columns passed to the Hashing operator matter where the memory grant is concerned. This doesn’t mean Hashing is bad, but you may need to take some extra steps when tuning queries that use them.

The reasons are pretty obvious when you think about the context of a Hash operation, whether it’s a join or aggregation.

  1. All rows from the build side have to arrive at the operator (in parallel plans, usually after a bitmap filter)
  2. The hashing function gets applied to join or grouping columns
  3. In a join, the hashed values from the build side probe hashed values from the outer side
  4. In some cases, the actual values need to be checked as a residual

During all that nonsense, all the columns that you SELECT get dragged along for the ride.

Here’s a quick example!

This query doesn’t return any rows, because Jon Skeet hadn’t hit 1 million rep in the data dump I’m using (Stack Overflow 2010).

Despite that, the memory asks for about 7 MB of memory to run. This seems to be the lowest memory grant I could get the optimizer to ask for

Hashtastic

If we drop the Reputation filter down a bit so some rows get returned, the memory grant stays the same.

That’s why I’m calling 7MB the “base” grant here — that, and if I drop the Reputation filter lower to allow more people in, the grant will go up.

Creepin and creepin and creepin

But we can also get a grant higher than the base by requesting more columns.

ARF ARF

This is more easily accomplished by selecting string data. Again, just like with Sorts, we don’t need to actually sort by string data for the memory grant to go up. We just need to make it pass through a memory consuming operator.

Thanks for reading!

Brent says: you remember how, in the beginning of your career, some old crusty DBA told you to avoid SELECT *? Turns out they were right.


Office Hours Guest Instructor Month

Company News
0
Andy Leonard

In our weekly Office Hours, we hang out and answer questions from you, dear reader.

In April, we’re trying something new: guest hosts. If you’ve got a topical question or something you want to ask a specific person, now’s your chance:

Register now.


Missing Index Impact and Join Type

Indexing
8 Comments

Just Another Way

No matter how you delve into missing index requests — whether it’s the plan level, DMV analysis, or (forgive me for saying it), DTA, the requests will generally be the same.

They’ll prioritize equality predicates, the columns may or not may be in the right order, the columns may or may not be in the right part of the index, and the impact…

Oh, that impact.

It’s all just a cry for help, anyway.

Like a teenager watching anime and buying intricate parasols.

Salted Grains

If I run these three queries with different join types:

They’re all going to ask for the same missing index:

Kinda weird already, that a join column is an INCLUDE, but hey.

What’s even weirder is that they have diminishing estimated impacts based on join type.

  • Loop: 99.1954%
  • Merge: 42.7795%
  • Hash: 28.7901%

It gets a bit stranger if I force parallelism!

Now the estimated impacts look like this:

  • Loop: 98.9569%
  • Merge: 29.1982%
  • Hash: 44.2455%

The Hash and Merge join impacts have just about changed places.

The funny thing is…

I totally agree.

As far as indexes go, that’s a crappy index for the Merge and Hash Join plans. But no better one is being offered, not even sneakily.

For the nested loops plans, it’s super easy to grab the UserIds for that date, and dig into the Posts table for just those.

For the merge join plans, it’s less helpful. While it’s nice that we can easily filter the date predicate, we still have to order our data for the merge join. There are further complications in the parallel version.

For the hash join plans, it’s a similar situation. We need to create a hash table on UserId. It being in the leaf of an index on Date doesn’t help us much, or rather as much as it would if it were in the key.

In short, both the merge and hash join plans have cost-increasing operators thrown into the mix that the index as requested just wouldn’t help.

Thanks for reading!

Brent says: in the Mastering Index Tuning class, these types of examples are why I tell students that you should look at one-key-column index suggestions carefully. In most cases, SQL Server really needs one (or more) of the included fields to be in the key. The hard part is figuring out which one(s) without looking at the execution plans.


Building SQL ConstantCare®: Product, Packaging, and Pricing

SQL ConstantCare
8 Comments

Our new SQL ConstantCare® is our first paid online service – it tells you what you should do next on your SQL Servers. I’ve written about the vision and the architecture design, and this week I’m continuing to share the behind-the-scenes planning.

When you buy it, how exactly should we charge you? There’s a few options:

  • Priced per-server (say, each server is $X per month)
  • Priced per-user
  • Priced per-tier (maybe basic reports are $X, and more in-depth reports are $Y, or maybe it’s based on the frequency of reports)
  • A combination of the above (like $X per user for the first Y servers, then $Z for unlimited)
  • Freemium – free for up to X servers or users (and freemium could be mixed with either per-server or per-person pricing)

I ruled out per-server pricing early on because I wanted users to be excited to use this on all of their servers. I’ve talked to a lot of monitoring software customers who said things like, “We have 1 license of our monitoring software, and we move it around from server to server based on what’s having problems.” Screw that – I need to get data on your server when it’s healthy, before the problems strike, so I can identify things like good execution plans before they go awry.

I kinda wanted to try freemium. Because we built it with cost-effective serverless technologies, we could really afford to process the data for tens of thousands of SQL Servers for free. However, that’s only the processing – not doing tech support or improving the service. Those cost money, and I wasn’t sure I’d be able to convert enough free customers to a high enough premium tier to pay for that support & development. (Remember, I’ve got the ISV background, and I know how expensive those departments can be.)

One could also argue that there’s a real value in getting performance & health data on all of the world’s SQL Servers, and then using that data to identify the right customers for consulting and training. It might even make us a strong acquisition target because we’d have data that other ISVs, consulting companies, and cloud hosts might want. However, companies probably don’t want to acquire a list of customers who aren’t even willing to pay $10 to make their SQL Servers faster or more reliable. And worse, I didn’t wanna be acquired out of desperation because we’d run out of money, and that seems to be a recurring theme with companies that get acquihired.

Besides, in a way, I could kinda do limited-time-freemium by doing a Free Server Friday, running sales, giving folks a free trial for a week, or even using it as part of our annual salary survey. One of the things I loved about WooCommerce‘s pricing & coupon handling is that there’s a lot of flexibility.

I settled on simple per-user pricing for the MVP (Minimum Viable Product – a concept in Lean Startups, something I’ll be talking about later.) As we add features down the road, we can add additional tiers like Enterprise, ISV (who needs to watch their customer servers), or Consultant (yes, I want freelancers to be able to use this to understand what’s happening on their clients’ servers.)

Then I had to decide what was included.

The beauty of controlling your own services and content is that you can figure out how you want to bundle them together for the best customer value. We do consulting, training, and now online services – so what was the right way to sell the new service? Should it be standalone, or bundled with:

We might just let you pay with these
  1. Access to our video training library (because the emails might point you there, telling you in more details about the task you need to perform on a server)
  2. Access to our live online classes
  3. Access to our consultants – could be private emails, or via our Slack channel, or comments on your PasteThePlan plans, or X hours or WebEx time per month – which could be expensive for us, but as Paul Graham suggests, try things that don’t scale
  4. Access to entirely new stuff – like perhaps deeper levels of analysis for PasteThePlan – I ruled this out because I didn’t want to spend any more development time than I had to, at least not yet

So say for example we give you access to #1, but we charge for #3 – here’s what one of the emails might look like:

You’re currently not running CHECKDB, and based on your data volumes and your low server horsepower, I don’t think you’re going to be able to start. For now, you should set up a weekly CHECKDB job with the PHYSICAL_ONLY option. Here’s a video explaining how to do it, and if you want to learn more about the mechanics behind my recommendation, here’s a 1-hour video that goes even deeper. If you’re not confident in how this works, click here to buy & schedule a 30-minute call with the next available consultant.

Or, if I had a much higher price and included a set number of consultant calls per month, that last sentence might look like this:

Got questions about any of your homework tasks this week? Click here to schedule a 30-minute session with the next available consultant. You have 3 remaining free sessions this month.

For the first version, I decided to keep it simple: email advice, plus access to all of our self-paced recorded video training. In the future, we might add in a Pro level with bundled pricing for access to our consultants, or even refer folks to other consulting firms that are good fits for the problems they’re facing.

The packaging and pricing affected our other products, too.

Oh you’ll know when the marketing launch hits, trust me

We’re taking a relaxed approach to launching this, just kinda putting it out there on the site. We figured we’d learn a few quick lessons about the install process, troubleshooting, email quality, market interest, and how it’d affect our other products. (And we did!)

Our recorded videos used to be available at various monthly subscription prices. We started by throwing all that out and boiling down to just one product, SQL ConstantCare®. This way, as you’re going through the video training, you can get tips on what modules you should be watching that will be most relevant to your own servers. (We have so much video training that people often ask what they should watch first – this solves that problem.) This helped us learn if search visitors were interested in buying the full-blown SQL ConstantCare® product, or if we still needed to sell the videos separately.

For our live training classes, the corporate purchasing department often buys the tickets for the students. Those purchasing folks get twitchy when they see a subscription renewal notice in their cart like that they’re going to get charged again later – so for now, we’re not including SQL ConstantCare® in the live class purchases. I think we’re going to soon, though – I’ll just tweak the checkout process so that they get one year of SQL ConstantCare® with no warnings about renewal.

There was a consulting change right away I didn’t expect: sales prospects started and asking, “Which service is right for me, SQL Critical Care® or SQL ConstantCare®?” The way I’ve answered it is that our 3-day SQL Critical Care® is like an emergency room for SQL Server where we work together face to face to solve an urgent problem. SQL ConstantCare® is more like ongoing therapy where I send you one email per week with advice over a long period of time. I’ll need to work on the web site to make that decision easier for folks.

In subsequent posts in this series, I’ll talk about drawing the line for the Minimum Viable Product (MVP), what we learned from the first few rounds of users, how we’re aiming for GDPR compliance, and more. Next week: what we decided to collect in v1 – and what we avoided.


[Video] Office Hours 2018/3/21 (With Transcriptions)

This week, Brent, Erik, and Richie discuss replication, SQL Source Control with Git, installing service packs in cumulative updates in SQL Server, query tuning, write errors, log backups, Azure-managed instances, 3-rd party backup software, and DBA porn.

Here’s the video on YouTube:

Office Hours Webcast - 2018/3/21

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 3-21-18

 

Erik’s speaking at SQL Saturday NYC

Erik Darling: I know that we’re due to start soon but I wanted to just let anyone watching know that I’ve been accepted to present at SQL Saturday in New York City on May 19th. So if anyone is, I guess, anyone from the tri-state area wants to come on down and watch me babble about stuff, I’ll be there. One session I know that got selected was Query Optimization Explained in 15 Demos. I might have a second one since they said that there are still some open slots, so I don’t know.

Brent Ozar: Awesome.

Erik Darling: So I’ll be doing at least one there. So I look forward to maybe seeing all of you from a distance.

Brent Ozar: And we’re also always ready – or we’re always interested – to hear what people would want to hear us talk about too. So if you’ve got ideas for sessions you’d want to hear us give, feel free to put that into the questions as well.

Erik Darling: Replication. Fixing replication disasters.

Brent Ozar: Damn it…

Erik Darling: What stinks is you’re going to say that and people are going to say, like, a lot of HADR topics are going to come up and I am just gasless when it comes to HADR topics. I have nothing to say about failover clusters anymore. They exist; I’m sorry.

 

I have replication and Always On Availability Groups…

Brent Ozar: Speaking of which, so RimJim starts with a question. He says, “Hello, I have a replication question. We have Always On set up and the primary is also the publisher. We don’t have an AG installed. If I want to remove the main publisher…” None of us in here know. This is the funniest thing about doing the replication. One of the slides [inaudible] talks about going to dba.stackexchange.com for posting multi-paragraph questions, and we just tend to avoid replication like the plague; really because it is the plague.

Erik Darling: It is. It is a pox upon your servers. It’s such, like, an outdated technology at this point. There’s just so many replacements for it that are easier to deal with and are less brittle and break less.

Brent Ozar: And it’s not like it’s bad. I get that people go, oh this works for me and I want to keep using it. That’s cool, just you get that, as you’ve tried to ask questions because the thing breaks all the time and you’re trying to get changes made to it and nobody’s around to answer, you go – man, go look at Amazon.com – go look at the books on replication and then look and see what the authors are doing now; it’s not replication. They’ve all gone somewhere else.

Erik Darling: If you need, like, a primer in some troubleshooting stuff, Redgate has a free PDF about replication, but it’s old. It’s from 2009 or 2010 or something. Like, no one’s keeping up to date on that. Like, replication just hasn’t changed where there’s like a constant need for replication experts to keep their skills up to date. It’s like all this is the same problems for the last 15 years.

Richie Rump: Replication does not bring to the boys to the yard.

Brent Ozar: It keeps the boys from the yard. RimJim follows up with, “The reason we have replication is – we don’t care… “Because the subscriber, we have custom scripts that do not replicate all the delete commands.” You can do that with SSIS too. You can do that with stuff like continuously copying data from one place to another. You’re right in that this is what it was made for, that’s a good thing, but it’s just painful. It sucks. Richie, I don’t think he wants to ask questions.

Richie Rump: No, no, no I was waiting for the green light here.

Brent Ozar: Green light, go for it.

 

How can I get started with source control?

Richie Rump: Alright, so Anon would like to hear more about SQL source control with Git for a total newbie.

Erik Darling: There you go, Richie.

Richie Rump: No, you guys are the SQL experts. Please, go ahead. Talk to me about Git; I’d love to hear what you have to say.

Erik Darling: So, like, the Venn diagram of SQL expertise, and then like Git and source control expertise – if you can picture my hands as circles and then picture the Venn diagram slowly expanding as you add these words into the equation, they are far apart.

Richie Rump: And once you get Pluto, keep going.

Brent Ozar: Pretty far. My thought was, I couldn’t get into it until Jeremiah kind of forced me into it. He’s like, look here, let me show you how it works. And I think I would go – instead of coming at it from a SQL source control angle, I would go to your developers. Talk to your developers because they’re already really comfortable with Git, GitHub, TFS, whatever kind of distributed source control that you want to go use.

Richie Rump: Yeah, and there’s different levels, right. Because you could just use Git and just throw things in there and everything’s fine, but when you start working with other people – the Git workflow and there’s different ways of doing it and branching and merging – it can get pretty confusing pretty quick. So I understand why, you know, all these DBAs who are now forced to use Git, they’re really confused and they don’t want to deal with it. I understand it.

Erik Darling: I mean, if you wanted to give someone a pillow-landing for git, you could try using GitHub Desktop, which, kind of, GUI-izes a lot of the stuff that you have to do. That’s what I use because I’m soft. So it helps in that regard because I can do stuff and I can fix stuff. I’m still a little gun-shy when it comes to fixing a conflict on a merge or something, but for the most part – for the scripts we use, I know them well enough. I have enough domain knowledge about the scripts where it’s not awful for me to use the GUI interface for it. If you wanted me to use a command line, like actually use a Git, I would run screaming. [crosstalk]

Richie Rump: Yeah, even I don’t use the command line. Visual studio code has a Git lens hook-in – plug-in – and it works great. Source Tree – I use Source Tree for visualizations so – I mean, I could use the command line just like anyone else but I don’t need it.

Erik Darling: No, good stuff.

 

Should I stop SQL before installing CUs?

Richie Rump: So Don asks, “What are the best steps to installing Service Packs and Cumulative Updates on SQL Server; change to single user, stop agent service, stop service?” Blah, blah, blah, go on, go on, go on.

Brent Ozar: I’m a huge fan of – like if you have any high availability – patch the secondary first. Do it during the week, when you’re sober, like between nine and five. Once you’ve done it a few times, you get used to what you’re able to do without a reboot and what actually requires a reboot. But then that way, if the thing’s all patched up and ready to go in, it’s just a secondary. Then you can failover to it during your outage window; make sure that everything’s okay.

And in that scenario, the whole thing about stopping the service and agent and all that is much less relevant. You just don’t have to worry about it as much. If you only have one server and it’s up and you have to do the patching live, oh my god, it just makes me nervous. Like, you can do it but the whole concept of stopping agent and stopping the service and all that is a giant pain in the rear. I’ll tell people, if they want me to performance tune and get it right to that level, just give me a secondary instead.

Erik Darling: The other thing you’re looking at is cutting off access from the application so that no users can try to sneak in and hook in, or whatever. You know, if it is just a standalone instance, you really have no choice but to ask for whatever maintenance window you think you need to install and then rollback if the install, something goes awfully bad with it.

 

How can I change this scan to a seek?

Richie Rump: Okay, so [Tishal] – if I mispronounced your name, I’m so sorry. I write code all day, I’m… “I have a TOP1 ORDER BY descending query that has an index on it, but it does an index scan and reads about eight million records. sp_BlitzCache shows parallel warnings; no missing indexes. It’s one of the most expensive procs in terms of CPU. What can be done to change an index seek? Could cast be a problem?”

Erik Darling: There is a cast function in it, like…

Brent Ozar: There’s casting UTC reads…

Erik Darling: I mean, that’s not going to change much.

Brent Ozar: Yeah, I have bad feelings. Everything about this gives me bad feelings.

Richie Rump: Why? What are the code smells?

Erik Darling: I don’t know. What I want to do is see the query plan. I want you to take the XML and put it on PasteThePlan and then, if you want like super detailed help with it – because the amount of information you’re giving me just isn’t enough to tune a query with. When you say there’s an index on it, I’m like, okay, there’s an index on lots of stuff. Post it in dba.stackexchange.com – what a nice time for that slide to pop up. I don’t have anything to do for the rest of the day, so if you post it on there, I’d be happy to take a look and see if there’s anything…

Richie Rump: Well I have some issues that you could help me with, Erik.

Erik Darling: Okay [crosstalk]. Just try it in Postgres, Richie. Write a unit test.

Brent Ozar: The one thing that does stand out in there is you’re like, why can’t I get an index seek? If SQL Server is trying to compare two different fields because you’ve got two different fields in there I-date and B-date. Two different fields and either of them could be less than some number. My guess is that SQL Server is going, this is going to match a large percentage of the table; why bother doing index seeks on it? You could try rewriting it as a pair of unions so that one searches for I-date and one searches for B-date, but it’s not going to be a small change.

 

Should I partition to fix blocking problems?

Richie Rump: Okay, so Pablo asked, “Ola, amigos…” Ola Dora, [speaking Spanish]…

Erik Darling: It would be racist if anyone else did that. Richie gets to do it, using his Spanish privilege against us.

Richie Rump: The guy who is third-generation Mexican American and speaks a little Spanish who lives in Miami. It’s my burden to bear. “I have a third-party app that is currently making a huge amount of transactions per second on a particular table. It is getting heavily locked and the app is giving timeouts. Do you think that some kind of partitioning in the table may help the locking problem? Or maybe another suggestion?”

Erik Darling: No, not partitioning anyway. Just because if you’re not getting adequate chunks of data eliminated with the indexes you have now, I think trying to add partition elimination into the mix is not going to make that any simpler or give you any extra guarantees that it’s going to work.

Brent Ozar: And always define what huge is, because numbers mean different things to people. Is it like 10,000 batch requests per second, 50, 100,000 batch requests per second? You know, give me a rough idea there. Plus, too, if you’re running into lock escalation, when you say huge numbers, that immediately makes me think, if you’re tripping up past 5000 rows locked – because we’ll end up doing index scans instead of seeks – partitioning it probably isn’t going to help too much because you’re still going to escalate pretty quickly when you’re locking thousands of rows.

Richie Rump: Or when you’re popping and locking.

Brent Ozar: When you’re popping and locking. You did say, “Do you have another suggestion?” I’d say, depending on what the primary – if it’s readers blocking writers or vice-versa, RCSI might be a good fit; read committed snapshot isolation. If you go to brentozar.com/go/rcsi, we have posts about that. it’s a setting that you can just flip a switch and magically, blocking starts to disappear. There’s just so many gotchas with it that that’s why we have the whole page about it.

Richie Rump: Well, like tempdb would be the biggest one, right?

Brent Ozar: Yeah, the tempdb throughput and size. If someone doesn’t BEGINTRAN, their tempdb can just start growing like crazy. Richie, you’re going to kill me but your microphone’s doing the thing again, the static-y thing. I know right. Hold on – so that’s that. We’re all playing chicky. Go ahead…

Richie Rump: Is it better now?

Erik Darling: Maybe.

 

We have this problem with Windows failing to write…

Brent Ozar: Mike Chubbs asks, “We’re having this error that our Windows folks have no idea where to look at. The requested resource is in use when I try to write. During a write, it offset…” Blah, blah, blah, blah, “In file.” Two words – except I guess it’s really one word – antivirus. So it’s typically a filter driver; something that’s intercepting writes. Antivirus grabbed a lock on something. A filter driver – so you’ve got some kind of filter driver that’s trying to do compression or encryption or snapshots. Those are all common causes for that.

Erik Darling: I know that Semantic, or however you pronounce it, is kind of notorious for that sort of thing. So check and see if you have Semantic running anywhere. I know it’s going to sound like the awful, awful thing to say, but if you run Xperf, you can usually find the filter drivers that are using a whole bunch of CPU and doing stuff and figure out which ones are, you know, messing with your files. Good old ETW trace.

Brent Ozar: Nice. Richie, you want to test your microphone?

Richie Rump: Am I back? Hello.

Brent Ozar: You are back. You sound great.

Richie Rump: Okay, I need to stop using Chrome. I think Chrome is my problem.

Erik Darling: You have a Chrome problem?

Richie Rump: I may – I make everything chrome…

Brent Ozar: Justin Bieber.

 

Can a log backup cause a stack dump?

Richie Rump: Alright, so Gordon has a question. “Is it possible for log backup to have an exception access violation and subsequent stack-dump?”

Brent Ozar: Sure, corruption. Corruption will do that. If it tries to backup corrupt data, that will totally do it. I would check CHECKDB for starters. Anytime you have any kind of error-running backup, I would probably be worried about CHECKDB.

Erik Darling: I’m trying to think, that isn’t like guaranteed to be a CHECKDB thing. It could be an underlying – is there a different disk that you can try taking the log backup to, just to make sure it’s not a pure hardware issue. That might be something to do if the CHECKDB comes back clean.

Brent Ozar: And if you’re backing up to a network, try local. If you’re backing up local, try the network.

Erik Darling: And, you know, it might be a situation where, like, you have to add a second log file and get rid of the first one because it’s on a bad bunch of storage or something, too.

Brent Ozar: I should have also thrown out too, weirdo backup software, I’ve got a rash of calls in lately going, I use such and such backup software, and when we look at their data, they’re not doing backups. I’m like, I don’t know what that thing does. But generally, if you paid less than $100 for your backup software, it’s probably not doing what you want it to do. Richie laughs – it’s true. Big customers have emailed in like three times during sales prospects and they’re like, I use blank, and I go to the website and I’ve never heard of it before. I’m like, $49 with a 90-day trial?

Erik Darling: You just can’t software that costs less than $100.

Richie Rump: My copy of DataGrip cost more than that.

Erik Darling: I was bummed that didn’t work for me. The connection string worked first time in Postgres – [crosstalk]…

Richie Rump: But there’s something going on there. You just have to dig into it a little bit. Since you’re not doing anything for today, you could…

Erik Darling: That’s not true. I have to answer that person’s question about the query.

Richie Rump: Oh okay. I mean, that’s Robert’s question, right, who wants to know how to troubleshoot an OLEDB wait type.

Erik Darling: Ooh, ole.

Brent Ozar: It’s early for Cinco de Mayo. Linked servers is one way to do it. I’d say, hold up though, what makes you think OLEDB is the problem. Because often, people will run wait stat scripts like sp_BlitzFirst – we’ll do the same thing. We list OLEDB because sometimes it actually can be bad, it’s just super-rare. If it’s one to one with clock-time – like if your server’s been up for 1000 hours and you have 1000 hours’ worth of OLEDB waits, it’s harmless. It’s a monitoring tool like SQL Sentry, Quest Spotlight, IDERA SQLDM. He says, “I get it in the monitoring script.” There’s so many. I get it in the monitoring script too.

Erik Darling: Don’t we all? Like Brent said, remote queries will cause OLEDB, DBCC CHECKDB – it will register OLEDB waits – it just might not be that bigger deal, unless you’re running CHECKDB 24/7, just in a loop.

Brent Ozar: Oh god, he follows up with, “I’m moving 180,000 rows via linked server.” Yep, well, that will do it.

Erik Darling: We found the problem.

Brent Ozar: He says, “The waits go to two seconds and then drop.” Yep, that’s exactly what it does, is reset every two seconds. It still could be waiting on OLEDB, but if you’re trying to move 180,000 rows faster, I usually keep a USB floppy drive or something like that around. You know, a CD-ROM maybe might be faster than trying to move that kind of data around.

Richie Rump: Or SSIS?

Brent Ozar: SSIS – that’s a good one.

Erik Darling: Or, you know, just not moving 180,000 rows at once, you know. Move like 1000 – 5000 rows at once; something that’s a little bit easier on your servers and on your network and all that other stuff.

 

Any opinions about DACPAC deployment?

Richie Rump: Yep, okay. So, Brian has a question. Somewhat of a follow up to the SQL and Git source control question, “Have any of you worked with, or have any strong opinions, about the DAC-pack/SSDT based development and deployment?”

Brent Ozar: Yeah, so I tried it when they first came out. It sounded like a really interesting thing. And I try to keep optimistic about, hey, Microsoft’s got a new technology. I should play around with it. And the problem for me with DAC-pack based deployments was, if I wanted to make a schema change to a large table, it was really weird how it worked and it generated a ton of log data. So it seemed like it worked across small databases. And I’m going to use, as a number for small, less than 5GB of data. But it totally fell apart at like 50GB of data. The deployment was just brutal. Plus dropping columns was horrific.

I found that I could accidentally drop columns during the deployment and it was way too easy to go and do. So I was burned out on it within the first six months. I kept trying it and would go, ah it’s just not working for me. I haven’t heard from anyone else that they swear by it, so I don’t know.

Erik Darling: A lot of people who have to do that sort of thing just use Redgate SQL Compare and have much better luck with that.

Richie Rump: Yeah, I prefer SQL Compare. I am not a DAC-pack person whatsoever. You just don’t have a lot of control, from what I’ve used of it, in those deployments. At least with SQL Compare, I could say, here’s A, here’s B and give me the deltas and then I have a script and I can review it and whatnot, as opposed to, I’m just throwing this thing out there and it’s going to do something and hopefully it will work. I don’t know.

Erik Darling: So I’m going to give Redgate SQL Compare the best accolade that I can give a piece of software. It was so easy, I could use it.

Brent Ozar: I’d agree. The wizards are nice, the side-by-side comparison thing.

Erik Darling: It highlights where things are different.

Richie Rump: Does that work for Postgres? Because lord knows I could have used it yesterday.

Erik Darling: Why don’t you use – what was it you told me about? KDiff3.

Richie Rump: KDiff is essentially just for get and compare text files. It’s not going to go into your database and say, okay this is this and this is that. So what my problem was, yesterday, I did a deployment to Postgres and I had some scripts. And as I usually do, I’ll go ahead and put all these updates into a different script and I’ll keep track of it that way, if it’s gone into production or not.

I didn’t update the scripts from some of the updates that I had made in source control, and so I went ahead and deployed an old version of updates. And that’s when Brent saw all of this red and started laughing at me and things like that. But I fixed it. I fixed it.

 

Why do we need backup software?

Richie Rump: So Steve has a question, “Why backup software? Can’t we do this with SQL Server tools?”

Brent Ozar: My thought, from working – used to work for Quest, and so I was always worried about that question when people would come and ask, what do you need LiteSpeed for? Can’t we just do a native backup? The compression options, the encryption options, the monitoring, the log shipping GUI, all that kind of thing is phenomenal with third party tools.

Now, the ability to do integration with storage snapshots – there’s all kinds of – smart differentials, or one of my favorites, object level recovery. If you’re doing backups with LiteSpeed, you can pluck an individual object out of the backup. Because, newsflash, people don’t call you to restore a 10Tb database. They call you to restore one table out of the 10TB database, and then you don’t have any space to go do it. You have to shuffle it around from one place to another. Man, third party backup tools make that kind of thing way easier.

If you don’t have – I may make a generalization – if all your databases are under 1TB and you’ve got, say, less than 100 servers, then you could probably get by with native. But as you go past 1TB, man, the third-party stuff is phenomenal.

Erik Darling: I look forward to object level restore, people in Azure start having to restore large things in Microsoft, all of a sudden realizes, oh yeah, maybe it’s not so much fun to babysit a 25TB restore when I just have this 50,000-row table that I need to get back.

Brent Ozar: And especially if they’re paying SLA refunds to people when it all starts to take three days.

Erik Darling: Another cool thing about tools like LiteSpeed, beyond what Brent said with the object level restore, is transaction reading. So again, what no one ever tells you – all sorts of like great DBA porn – no one ever tells you to restore just one table. No one tells you to restore that table just to the last transaction log backup. There’s always, like, you know, we’re taking hourly transaction log backups and something happened earlier this morning, and you’re like, cool. I don’t know what that was. You don’t know when that was and you don’t want to have to go through that crazy restore process to get things back up to here and then realize you went too far and have to start the whole process over again. You really just want to get to the point where something bad happened and stop.

Richie Rump: So is DBA porn like, look at the IOPS on that server?

Erik Darling: Yeah, stuff like that.

Brent Ozar: Scan paged, PFS pages.

Richie Rump: Look at the waits on that guy…

Brent Ozar: Or when you go into DBA cubicles, it’s always the PerfMon counter poster up on the wall.

Erik Darling: It’s so true.

Brent Ozar: It is. That’s what we put. That’s exactly it.

 

What’s up with Azure Managed Instances?

Richie Rump: So you guys were playing with a new service that Azure just released. Why don’t you tell everyone about that and, kind of, what you found out about it?

Brent Ozar: So, Azure managed instances – this is kind of, I’m going to use the term replacement – but I think it’s the thing that people are going to light up in excitement for over Azure SQLDB because you get cross-database queries, you get easier backups and restores, you can log ship up to it, in a weird kind of way, if you want to restore full backups up in Azure Managed Instances. It has more compatibility with more features that Azure SQLDB doesn’t allow you to support. You can query DMVs, you can go into the master database, you can put stuff in the master database for that matter, if you want.

It’s relatively affordable. It’s pretty cheap when you think about it and you can even buy it in eight cores, 16 or 24 cores. You don’t have to worry about memory. You don’t really have to worry about storage even. There’s just a slider for how much storage you want. It’s not provisioned by price or anything; it’s just a slider for storage – how much storage space you need.

Erik Darling: Does storage size tie into storage speed?

Brent Ozar: No, there’s just a size thing and that’s it. So it’s pretty impressive. Easy to set up. Some of the things that we found were that there’s all kinds of new DMVs. There’s these new feature switches where it looks like they’ll be able to turn things on and off more gradually. There were thousands of features in the feature switches table. It’s pretty neat stuff. It gives me excitement. It’s DBA porn, I suppose you could say.

There was an issue that I caught around database corruption where you could corrupt a master in a way that your entire in managed instance would be unrecoverable. I have every confidence they’re going to fix that before go-live. But it’s like anything else, all software has bugs.

Erik Darling: I mean, this is brand new when we looked at it, too. The stuff that we found isn’t stuff that is going to be there forever or stuff that you have to be eternally vigilant about. Like, there’s a whole bunch of stuff in SQL agent that doesn’t work. Like, you can’t call jobs that use command shell. You can’t call jobs that use PowerShell. You can’t call SSIS job steps from it. There’s no maintenance plans in there. You can’t use Ola Hallengren’s scripts. There’s stuff that doesn’t work yet; all stuff coming down the pipe. The T-SQL things are different between managed instances and real live T-SQL, but you know, it will get there. I have a lot of confidence in this so far. I think they’re doing a lot of things right.

Brent Ozar: Man, for a preview, it was freaking phenomenal. Just awesome.

Erik Darling: My biggest gripe was – sorry, go ahead…

Brent Ozar: Your biggest gripe was what?

Erik Darling: The memory didn’t go up when the core count went up, but again, previews.

Brent Ozar: Alright, well thanks everybody for hanging out with us this week and we’ll see y’all next week on Office Hours; Adios.