This week, Brent, Erik, and Richie discuss using foreign keys in a data warehouse, SQL Server virtualization on Hyper-V clusters, having all of your SQL Server instances under the same active directory account, updating stats, changing job ownership, why you would want to go to SQL Azure, and why they have an issue with replication.
Enjoy the Podcast?
Office Hours – 6-20-18
Can I use foreign keys on columnstore indexes?
Brent Ozar: Tom says, “Are foreign keys compatible with clustered column store indexes? Are you allowed to do foreign keys with column store indexes?”
Erik Darling: I think I remember there was – like, 2014, you couldn’t.
Brent Ozar: Yeah, he’s getting some kind of error in object explorer. He’s getting an error; column store index cannot be created on a table with a foreign key column. However, if you hit okay, it creates the clustered column store index. My guess is maybe you might even have an older version of SSMS where you’re running into this. I haven’t seen that for sure. And it may be creating the clustered column store index and ripping out the foreign key; who knows? We’ve had the discussion before about should you have foreign key relationships in a data warehouse. Well, let me rephrase this question to should you use foreign keys in a data warehouse?
Erik Darling: I generally don’t care and I usually err on the side of no just because when you’re doing ETL and you’re loading stuff, I don’t want the extra overhead of having to check that, especially when you get to the size of a data warehouse where ETL processes really start to matter. Like, you hit the terabyte plus size, you don’t want every single load to be checking over millions to billions of rows for foreign key integrity. Unique constraints are another one that I would avoid in there too. It’s just like, at that size, the benefit you get from column store far outweighs the benefits that you would get from unique constraints or foreign keys.
Brent Ozar: I’m with you. Richie, any thoughts?
Richie Rump: It should be taken care of in the OLTP.
Brent Ozar: Yeah, I’ve gotten burned so many times with data warehouses that I’ll have a bad file and I’ll want the rest of the files to still load because I only want to load the file that had the error. I’m going to load that again later just by itself, but I don’t want to stop other tables from loading just because I have one error in one file. I have such a limited nightly window, I’d rather get it out and done fast. And especially since you said column store indexes, that made me think it’s a reporting system; it’s probably not a transactional system. Somebody just had – Erik had a post this week about join elimination too. Foreign key join elimination, I’m so happy when it happens, but it’s so rare compared to other performance tweaks I can get.
Erik Darling: yeah, it’s a nice touch. Like, I appreciate the amount of time and Ph.D. students that Microsoft has thrown at the optimizer to get it to be as lazy as possible and get rid of that stuff as early in the plan as possible; it’s just not even thinking about it. I touched on it a little bit at my session at SQL Saturday New York when I was doing the optimizer demo. I touched on join elimination and stuff like that a little bit. Like, it’s really cool. I think it’s awesome, but I also just don’t count on it because who knows what some noodle-head is going to do to mess it up. So it’s not like I’m designing stuff to be, like, this is going to perfectly eliminate joins every single time. I’m like, this will probably happen most of the time, some of the time, somewhere in there.
Brent Ozar: And all it takes is one cumulative update where they tweak something and then all of a sudden you don’t get it anymore. It’s nuts.
So we’re getting started with Hyper-V…
Brent Ozar: Daryl says, “So we’re getting started with SQL Server virtualization on Hyper-V clusters.” Daryl, what year is this? Holy cow, you’re just now getting started.
Erik Darling: Hyper-V, of all the choices…
Brent Ozar: I wasn’t going to go there. [crosstalk]. I don’t know the last time we had a Hyper-V client.
Erik Darling: I had one. It was at least a year ago. It was to the point where I was just like, so, let’s look at some performance metrics, and they were just like…
Brent Ozar: We can’t.
Erik Darling: Yeah, you got some? I’ll buy some from you, some metrics that fell off a truck.
Brent Ozar: That’s so bad. [crosstalk] For those of you who haven’t played around with it before, VMware at least, whenever you’re looking at perfmon counters inside the guest, you can see data about the host and it automatically updates as you move from host to host. You can see things like how much CPU the other guests are using, how long you’re waiting for CPU time. Hyper-V, it’s left as an exercise for the reader. You’ve got to go monitor all the hosts and then you’ve got to correlate back to which guest was running on which host at which time; ain’t nobody got time for that.
So, Daryl says, “My server admin says we can pre-allocate CPUs or we can set the VM to dynamically allocate CPUs. What do you recommend?”
Erik Darling: Come on, man. This is your SQL Server. Why would you dynamically – you realize you have to pay for them, whether they’re dynamically assigned or not, right. You’re like – Microsoft doesn’t say, “Oh, well we can take seven grand off this month. Thy didn’t dynamically assign that.” No, assign the number of cores you need, adjust as necessary. That’s the beauty of virtualization, you know. You get to figure that stuff out. You could prep your machine if you know something big and bad is going to happen, if you have like a Black Friday sale or Christmas or whatever holidays you people sell stuff for. It’s nice to be able to pre-allocate those kinds of resources to a machine. Same thing, also – just because I use VMware for my home lab, I wouldn’t use it for production anything, but when I do home lab stuff, I use VMware. Over in the memory setting, make sure that’s not dynamic. Or if you leave it dynamic, make sure – I forget what the term is, but there’s a percentage down at the bottom, you want to have that up at like 80-90%. I just remember doing some load testing on SQL Servers in the past that were on Hyper-V. When I would leave the memory buffer percentage – I think the default is like 20% – things didn’t go well. Like, BlitzCache wouldn’t run, I thought there was a bug like I had massively messed up XML or something. As soon as I jacked that up to 80 or 90, everything ran really quickly. So that’s another little slider to be careful with VMware.
Brent Ozar: My whole problem with dynamic CPU and dynamic memory is they both default to relatively low numbers, so when you use the idea of dynamic CPU and dynamic memory, they’re going to default to, say, two cores and 8GB of RAM. What happens if your VM restarts? What happens if you do a patch? What happens if you have a failure and for some reason it moves from one host to another and you start at an artificially low number? Sure, someday it’s going to spin up and use more, but do you really want to deal with crappy performance at that time?
Erik Darling: It’s almost like another layer of CPU C-states, where you have to get the CPU, then they have to spin up, and then they get – it’s crazy…
Brent Ozar: Steve followed up and he said, “That sounded like Erik said no VMware in production.” I don’t think that’s what he meant.
Erik Darling: No, VMware is like – vStandard, right. Like if I had to do anything virtualized in production where things mattered, I would use VMware. I would not be using Hyper-V. that’s just how I feel about it.
Brent Ozar: Yeah, I know it was a good sell for a while because it was cheap, but I don’t know that they won that race.
Erik Darling: Just like the VMware, the tooling, the management stuff, the insights into what your guests are doing, all the metrics you can collect about things, it’s all centralized in vSphere. It’s really nice and it’s well worth the cost to have all that extra stuff, unless you just don’t need it. If you don’t need it, go ahead, Hyper-V away, but if I had to do anything serious, it would be VMware.
Brent Ozar: I always hear the Microsoft folks, the people who are really hardcore fanboys who will also say you can get all of that with System center, you just have to build it yourself. Yeah, but this is the year 2018, I don’t have time to build myself something and I’m going to do a really crappy job of it when I go and build it from scratch. And then when I walk out on vacation, all of a sudden, I have all these problems when I come back because somebody broke all my stuff.
Is it bad to use a single AD account for all my instances?
Brent Ozar: Daryl says, “All my 51 SQL Server instances are owned by the same active directory account. Is this terrible?”
Erik Darling: Not as long as nothing happens to that active directory account.
Brent Ozar: A great story there – one company I was working for, one of our employees was really pissed off and they left and they just happened to know that all of the SQL Servers were running under the same service account. It was set up that way long before I got there. I don’t know that I would have known anything different at the time. And what this person did was they went to Outlook Web Access and they tried to log in with the SQL Server account on purpose and they repeatedly used the wrong password on purpose, thereby locking out the SQL Server service account across all of my SQL Servers. That was a good time.
Erik Darling: I salute them.
Brent Ozar: I was so impressed. And the guy immediately knew that it worked because all our websites went down. God, that was brutal.
Erik Darling: So did you just unlock the account or did you have to change the password? Did you have to change…
Brent Ozar: The AD guys went – after we figured out what was going on, that the thing was getting locked out, because I’m like, I’m not doing anything. I don’t understand why it’s getting locked out. So they set it so that it could never be locked out again, but then, at the same point I’m like, now someone can just roll through and try every password over time. They have all the time in the world, so now I’ve got to start separating out all the services onto different service accounts.
Erik Darling: Which is no fun. That’s like no way that a DBA should be spending their time.
Brent Ozar: No return on investment there.
Erik Darling: End users are not like, woo-hoo, he did it. Yes, thank you, DBA Brent.
Richie Rump: I didn’t even notice anything happened.
Brent Ozar: Everything’s just as slow as it ever was.
Erik Darling: He did it; he made no difference.
Brent Ozar: Daryl follows up with, “I had a contractor change all of my database owners to SA. Is that okay?”
Erik Darling: I mean, I’d prefer that if I’m being honest about things. So, it’s not like if you have an agent job owned by an AD account, the AD account gets screwed; the agent job might start failing or whatever. It’s just as far as, like, when I think about users that I want to have elevated permissions on a database, I want to be able to assign those. I don’t want them to just have them because they’re the database owner. So having all the database owners be SA just makes that a little more clean and clear-cut to me and then I can assign – like, if that user needs SA on the database, I can give it to them, but they don’t just have it. They’re not going to be able to, like, fistfight SA for things.
Brent Ozar: And I think the public should hear too, in terms of how we handle database security at our company, right now there are changes that I need to make inside the database and I can’t do it because I’m not sysadmin on the database. I don’t even have write access to most of the tables, and I’d love that. I’m like, if something screws up, it’s not going to be because I fat-fingered an update statement because I don’t have permissions to do it.
Erik Darling: Yeah, we give all the security stuff to Richie’s German side.
Richie Rump: Yeah, they gave all the permissions to the guy who’s never been a DBA one day in his life.
Brent Ozar: And he’s doing the query tuning today too.
How should I handle disaster recovery for replication?
Brent Ozar: Steven says, “Do y’all have any recommendations for replication disaster recovery or is there anything that I should do besides just reinitializing replication from scratch?”
Erik Darling: I would argue for your use of replication for disaster recovery, Steven.
Brent Ozar: I think what he means – and I’m going to go out on a limb – I think what he means is he has replication and when he fails over to DR, he just wants to figure out how he’s going to make it working again.
Erik Darling: Oh, yeah, I don’t know then.
Brent Ozar: I don’t either.
Erik Darlin: Like capital R replication, it’s like I don’t know anything about it and I want to keep it that way because everything I hear about it is awful. It’s like, I don’t know, like Dallas BBQ. Like, I’ve never been there. It’s a restaurant chain in New York. There’s a few of them. People in there don’t look happy. I’ve never heard anyone coming out of there being like, that was the best barbeque I’ve ever had. I just kind of walk by like there but for the grace of god go I. Like not even – I’m going somewhere else.
Brent Ozar: People are drinking in there not because they’re happy, but because they’re sad.
Erik Darling: Yeah, and just big drinks that clearly came out of a machine with a little pour handle on it. Not anything that anyone cared about.
Brent Ozar: Yeah, that sounds – the only person on the team who’s ever done replication is Tara. Like, I’ve played around with it but not really done serious support in production. And she is very vehement that she does not want to support replication again either, so.
Erik Darling: Replication is so bad that it made her love Availability Groups. Just leave it at that.
Brent Ozar: It’s like some kind of Stockholm Syndrome. You’re not as bad as the kidnapper who made me deal with replication.
Should I flush the cache after updating stats?
Brent Ozar: Ronnie asks, “I was reading an article on updating table statistics using the update command. The author mentioned flushing the cache after updating stats using DBCC FREEPROCCACHE.” What the hell?
Erik Darling: You should fire that author.
Brent Ozar: Yeah, you should paste in and who that author was, Ronnie, if you remember it – we’re not going to mention their name on the air, but we’re going to make fun of them privately. He says, “Now I know that updates the cache at the server level, but I’m not sure if that’s a good idea or not. What are your thoughts on running DBCC FREEPROCCACHE?”
Erik Darling: Do you want to go first?
Brent Ozar: Would you like me to?
Erik Darling: It’s fine either way.
Brent Ozar: I don’t have a problem with it if somebody wants to do it because they think it’s going to solve a problem. The problem is with parameter sniffing, every new query that comes in from that point forward is getting optimized for whatever value someone happens to pass in. it’s like playing a whole lot of games of Russian roulette all at once. You’re just randomly spinning the barrel with whatever parameters get passed into a stored proc. I don’t like playing Russian roulette. I would rather not play that if I could so I would rather just coach the people in the – hey what do you think you’re fixing by running that command and let me see if I can fix it another way that doesn’t involve so many pistols pointed at my head.
Erik Darling: That is like Russian roulette with a Gatling gun. Like, one of those is just going to be it. So I think my problem with it is that it’s a very heavy-handed way of fixing a problem that’s probably pretty narrowly scoped. So you know, really it’s just like scorching the earth when you just need to pull a few weeds, I think. When you update stats anyway, if you had plans that were referencing them, they would be invalidated and likely recompile anyway, so there’s not a whole lot of value in also wiping out every other plan in the cache along with the ones that would be affected by the stats update. So stats update yes, free proc cache probably not. There are ways to target free proc cache to a specific SQL handle or whatever so you can, like, not go to a specific plan from cache, which is preferable to just wiping out the whole thing because seriously, you most likely don’t have a problem with the entire plan cache, just a few little pieces of it, like Brent said with the parameter sniffing.
Richie Rump: Why are you always so violent with the Russian people. I don’t understand…
Brent Ozar: The mafia and the gun jokes. I’ve been more cognizant of that lately. Like all of a sudden, I hear how many times I make gun references. And you would think I’m some sort of like violent person – I don’t watch boxing, I don’t own a gun, I never have. I fired a shotgun once and that was enough for me…
Erik Darling: He still has a bruise from it.
Brent Ozar: Yeah, these are not large arms. Recoil is not a word that’s friendly to me. Like, damn that was loud. If I wanted to hear something that loud I’d go to a Metallica concert – which I do, I mean, I like hearing things out loud, but you know, I don’t want to have to blow back on my arm whenever I do that.
Who should own my Agent jobs?
Brent Ozar: Daryl asks, “I wanted my contractor…” Daryl, why would you have a contractor? “I wanted my contractor to change my job ownership…” It’s probably the same contractor Richie’s using. “Job ownership to the same active directory account that owns the databases, but he changed the job ownerships to SA. Is that okay? Was I wrong to begin with? Who should usually own jobs?”
Erik Darling: I’m down with SA owning the jobs.
Brent Ozar: Yeah, I want the jobs to work even if some yo-yo leaves the company, locks their login, you know, if we disable someone’s login because they’re having problems.
Erik Darling: Also, have you ever run into a position where you needed a specific active directory account that would have permissions. Let’s say you had to access a FileShare way off in no-man’s land and you needed a specific account to do it. Have you ever run into that? Because that might be when you don’t want SA…
Brent Ozar: Yes, I have with agent jobs. I’ve had – like where I needed to deal with a specific file target, but I just gave the agent account permissions on that target, but that helps when I have a different agent account per server though too because if I had the same agent account across all my servers, that would suck. Suck is a strong word there, but…
Erik Darling: Less than ideal.
Brent Ozar: I would want to know, Daryl, more about the account that you wanted to use that wasn’t SA and why you wanted to use it. I don’t think we’d have a problem with that either. If you had a special AD account that you wanted to use for job ownerships, I think we’d be okay with that.
Richie Rump: Yeah, just let us know the URL to your domain and the user ID and we’ll try to lock you out.
Brent Ozar: We’ll check the password length for you. Just send us that…
Why would I want to go to SQL Azure?
Brent Ozar: Ron asks a very hard question. “Why would I want to go to SQL Azure?” Richie, why would you like to go to SQL Azure?
Richie Rump: I wouldn’t. I don’t know, I guess it depends on the app that I’m building, right. Do I need OLTP and I need to use Azure, well that’s probably [inaudible] that I need to go to. It’s not bad. I personally have never used it outside of just toying around with it, but if you’re familiar with SQL Server and you have to use Azure then go for it. Other than that, I’d probably just want to stay on-prem. If I have to go to a cloud, so you need SQL Server, do you need OLTP? There’s a lot of questions that need to be asked because now in the cloud, I can have any sort of database that I need and that’s a big bonus because now I can custom tailor my needs as the application to what data store is out there in the cloud. In fact, we have got an application that is using multiple – we’re using object data store and we’re using OLTP.
Brent Ozar: There’s one really narrow use case. Say that you’ve already got an application built and it doesn’t go across multiple databases; like it doesn’t do cross-database queries and you don’t want to host it on-premises, you want it up in the cloud with as little application changes as possible and you don’t use any of the features that Azure SQL DB doesn’t allow, agent jobs, for example, log shipping to other on-premises servers. Then that would be the use case where I would go if I wanted that hosted in somewhere that was really reliable and not in my company’s data center, it makes sense. But the instant that you trip to anything that Azure SQL DB doesn’t support – cross-database queries, the native way, agent jobs, all these other little features – then Azure SQL DB isn’t as good of a fit. Azure managed instances are a much more interesting fit. You’ve just got to make sure that the pricing makes sense for you and that their limitations like the number of replicas you have make sense for you as well.
Erik Darling: I think Azure managed instances are going to change people’s opinions about Azure pretty generally. There was a T-SQL Tuesday that was not well attended. There was like two posts on it because it was tell us about whatever experience you have with Azure, and there just wasn’t a whole lot of people being, like, let me tell you. Just because, you know, not a whole heck of a lot of adoption, especially – I don’t even know the right word for it. Like, I guess as far as people who are frequent SQL bloggers, no one’s is like, hurray Azure, my hero, you solved this massive problem for me. But managed instances, I think, are going to change a lot of that. managed instances have a lot of cool features about them. They’re sort of hybrid – you get agent jobs and a whole bunch of other stuff that you don’t get with Azure SQL DB. There’s a lot of good stuff about them and the hardware behind them is pretty beefy too, so you can put non-trivial applications up there and have them, you know, run pretty well, I think.
Why do you hate on replication?
Brent Ozar: Steven asks, “Why do you guys hate on replication?” Well I’ll be the first to answer on this one. I think it’s amazing when nothing in the app changes; like when absolutely nothing in the application isn’t going to change. Your queries aren’t going to change. Your tables aren’t going to change, your indexes aren’t going to change. But the instant that you have any changing in your application, like people add columns or drop columns to tables, they rework tables from scratch, they add stored procs and they drop stored procs, they add new databases. The more that this thing changes, the more fragile replication is and the more maintenance work that you have to put into it. Whereas when you look at alternative techniques like database mirroring, log shipping, always on availability groups, man, you can put a dinosaur in the database.
You can set the database on fire and it just keeps replicating off somewhere else. Security is less of a concern. The other problem I always ran into was people go, well I’m replicating so that I can offload the reporting query somewhere else. Wait, why are my reports three hours behind? Well, replication got behind because somebody did a bunch of updates or alter index – you know, changing a bunch of things – that cause it to have to do a new snapshot and reseed over again. Well, that’s unacceptable, we needed it to absolutely positively be on time. And it just doesn’t work that way.
Michael says, “Replication is a very powerful and useful tool.” So is a table-saw, you just don’t see me with one in my office. He says, “Especially for things like copying data from a prod SQL instance to a prod reporting instance. You offload the reporting query from your prod database…” Yeah, absolutely, it’s just the problem is as those tables change, replication is the least set it and forget it solution inside of Microsoft it is really not set it and forget it.
Erik Darling: So like, because we’ve hated on replication plenty, I will give replication one point in its favor and that is you can have replication move stuff over and then you can have like a post script run that adds different indexes for reporting. So with an AG, with mirroring, with log shipping, if you have a report that’s offloaded and is running like crap, you don’t get to create that index just on the secondary so that you can, you know, make that report better there. You need that index on the primary and that gets replicated over to the secondary. So that’s not great. So replication does have that in its favor. You can index specifically for, you know, whatever crazy Kookamunga reporting queries you have going on. So that’s nice. That’s a nice touch for the replication, but I would rather use almost anything else that’s less fragile, less prone to – I don’t want to have to send out tracer tokens and figure out what’s slow where. No thank you.
Brent Ozar: Not replicating deletes is another pro for it. You can only have your deletes happen on the production environment and not push them off to a secondary, so you can have a small primary and then a really large secondary. That’s cool too. Alright, and on that bombshell, we will leave. Thanks everybody for hanging out with us this Office Hours and we will see y’all next week. Adios!