This week, Erik and Richie discuss whether it’s relevant to specify data in logs in SQL cloud environment, licensing, using canary tables on Availability Groups, how Entity Framework limits tuning, reusing older databases, and more.
If you prefer to listen to the audio:
Enjoy the Podcast?
Office Hours – 7-25-18
In the cloud, should I separate the log files?
Erik Darling: Brad asks, “Is it relevant to specify separate drives for the data and logs in a SQL cloud environment since I/O is not the limiting factor?” So the cloud is tough because you don’t know where your disks are, who they’re hanging out with, what they’re doing. So separate drives, I’m not sure that’s going to buy you much, unless you specify – so different cloud vendors do it differently, so different sized files, different sized drives can sometimes get you better speeds. So basically, the more you pay for your drives, the better the throughput is on them.
So if you have a higher tier of drives or if you have write specific drives that you want to put tempdb or log files on, that would make sense to me. But just separating them for the sake of separating them, I don’t think that’s going to buy you much, especially because in the cloud, it’s so easy to up instance sizes, it might make a whole lot more sense to solve your storage problems by adding memory rather than add more storage or separate things out. How about you, Richie; any thoughts on cloud drives I/O things?
Richie Rump: No. I mean, it’s such a different world up in the cloud. I didn’t think separating will really buy you much, but then again, I don’t deal with a lot of the SQL-ness in the cloud. I’ve been doing a lot of managed providers where you don’t even have a choice of where the logs go; they just go.
Erik Darling: Yeah, you create a database and they go where they damn well please.
How does Development Edition licensing work?
Erik Darling: Julie asks, “Can you explain the licensing for 2014-16 SQL Server dev, especially the not for use with production data? Does that mean you can’t copy production data to the dev box for testing and development?” No, what that means is, as with all things licensing, please check with your licensing representative, whoever you buy stuff from, to make sure on this. But generally, what it means is that you cannot be serving up a production workload from that server.
So, like, you can’t have customers going in and doing stuff, but if you have your developers going in and testing new code and new indexes on there, then that’s considered development. You can use production data for that, but should you?
Richie Rump: No. You absolutely should not. There’s a lot of reasons why you shouldn’t use production data. One off, that happened to me many, many years ago, where you had production emails, or actually emails of users in there and now you start testing email functionality and emails start going out to customers from dev boxes and things like that. But, you know, the question is, do you want actual production data in the hands of your developers? Wouldn’t it be better if you actually had a set of data that you hand-crafted to go off and test certain things that doesn’t usually occur in your production data or may not be in your current production set?
There’s something to be said about having a known set of data outside of production so that you can do really some valid unit tests or valid integration, system integration, tests with. So I don’t usually recommend copying production over into a test or dev environment. Maybe if you’re doing some system testing or some other type of speed test or whatever, but even then, it gets kind of wonky because you’re probably not on the same hardware as you are on production anyway. So I don’t recommend just copying over production and saying hey, dev, go for it, you know. You need to think about it a little bit more and have dev create some data that will go through some tests. So when you are running it, hey, what about this situation or that situation or what if the data gets out of whack here? How’s the system going to go about that, especially if there’s some sort of regression in the software?
Erik Darling: There’s a lot of tools and products out there that can help you, sort of, scrub data or, like, mung stuff up so that it’s using – like you take a production dataset, you change, you flip things around, you kind of make it, you know, illegible to normal people. But that kind of comes back to another thing where it’s like, you know, you have to be very comfortable if you’re going to give developers access to that data. With the data you’re giving them access to – because for everything that people worry about with developers walking out the door with intellectual property, whatever else, walking out the door with customer data is arguably even worse because you take that to a competitor – you know, it’s a much bigger edge to have a customer list rather than, like, some stored procedure that any dingdong could write. So be careful with that too. But further to that, if you’re already giving your developers access to production to do whatever developer nonsense they have to do, you’re running into a whole different set of problems. So aside from the fact that you shouldn’t be using production data in dev, you also shouldn’t be giving your developers access to crazy production data in production. That’s how I would sum that up. I got off on a little bit of a tangent there.
Richie Rump: Well you know, that’s how I roll.
Erik Darling: So, you’re wearing a Cubs shirt. Are you looking forward to more baseball post-season fun this year?
Richie Rump: Oh, no we’re totally going to tank. We’re awful…
Erik Darling: Totally going to tank? Okay, just making sure.
Richie Rump: We lost, like the last two or last three or whatever and we’re terrible. So if we win the next two or something, we’re going to the world series, but as of right now, after last night’s game, [crosstalk].
Erik Darling: Yeah, Mets suck this year too. Just like trading people – no, go ahead, go have fun…
Richie Rump: The Mets were amazing because they were going to the World Series after 11 games, you know. They won 11 straight or something like that, then all of a sudden, everyone went to the hospital and just never came out…
Erik Darling: Aw, like grandparents.
What happens when a cluster poops the bed?
Erik Darling: Daryl says – so Daryl actually has a bunch of questions about Brent’s senior DBA class videos. I apologize that Brent is not here to answer them, so I would say, Daryl, if you have questions that you want Brent to answer about his videos, I would leave them either as comments on the videos or drop him an email with everything in there. I haven’t watched the senior DBA class videos in a while, so I don’t know that I would be able to answer them terribly well. The first one, though, is sort of answerable. Brent talks about the current vote and how it floats between both nodes. Daryl has one for each of current nodes. So the big question for me is, do you have an even number of votes or an odd number of votes? Do you have dynamic quorum or dynamic witness? Like, for me, there’s a lot more than just is each node having a vote a good idea? Like, you want to make sure that you have an odd number of votes so that, you know, the cluster will stay up if one thing kind of poops the bed in a not fun way. So I would need a little bit more information about that to give you further advisements.
Erik Darling: Tammy says, “Poops the bed in a non-fun way, as opposed to pooping the bed in a fun way.” Yes, Tammy, there are different ways to poop the bed that are varying degrees of fun. It’s a wild world out there. Let’s see, long questions, short questions, all sorts of questions…
I have these circular logic permissions problems…
Erik Darling: Ooh, Brian has a question that I think would be good for dba.stackexchange.com, “I have a circular logic issue with database ownership permissions.” Yeah, so that’s a tough problem for me to solve right here, so I would say post that on dba.stackexchange.com with as much information, as much, hopefully, obfuscated code as you’re willing to share about the issue and hopefully you will get an answer from someone who has been through that before. Any other thoughts on that?
Richie Rump: My mind’s blank on that stuff.
Erik Darling: We specifically avoid security anything in our client work because it is such a hassle and liability. And sometimes I feel even dirty when Blitz is like, these people are all sysadmins. I’m like, I don’t want to see their names, what they’re doing…
Should I have canary tables in an AG?
Erik Darling: Let’s see, Daryl asks, “Should I have canary tables on my AGs?” Only if you care about knowing how up to date they are when they failover, so yes. Most people have AGs thinking that they’re not going to have any data loss, or very little data loss. So I would say generally, canary tables are a good idea.
Does Entity Framework limit my tuning capabilities?
Erik Darling: Pablo says, “My dev team always blames ORM for bad performance. How does Entity Framework limit tuning?” That sounds like a Richie question.
Richie Rump: Yeah, the real problem is that it’s kind of obfuscated. So when you write a query for Entity Framework, you have to use their proprietor, or whatever, ORM and you’re using that language which is then translated into SQL which is then translated by SQL to do the plan and do all the other stuff. So yeah, it’s more of an art than I could tell you, hey, if you’re looking for this then you do that and it’s all one, two, three. You have to take a look at what’s going on in your plan, if you have any – sp_BlitzCache is probably the one that I would use first and see if you’ve got some really bad stuff going on there. And then kind of have to trace it back to the actual Entity Framework piece of code and then rework those queries in entity framework so that those are a little bit better to the SQL Server. There’s a lot of stuff that’s going on out there. I think I’ve got a few posts on brentozar.com about some Entity Framework stuff. There may be another one coming out. I think Brent may have released the hounds on another one soon.
Erik Darling: It was sitting around as a draft for like a year and a half. Like, was Richie done with that?
Richie Rump: Turns out I was and I just never looked. I mean, that wasn’t published, who knew? I’ve been busy on this constant thing…
Erik Darling: Yeah, constantly busy working on this constant thing.
Richie Rump: So yeah, it’s not because of entity framework, but a lot of times, it’s the way the developers crafted the code in their linked syntax and the way that gets translated by the Entity Framework itself. There are ways you could go and write your link code so the Entity Framework can write a better query, but it’s a trial and error thing in a lot of ways. So I would say, find your bad queries using sp_BlitzCache, trace them back to the link query, change your link query and then kind of do that back and forth so that you can actually start tuning those queries just a little better. Or, you could just drop it and say, I’m going to write a SQL statement for it or a stored procedure, and just do it that way.
Erik Darling: All sage advice. That’s advice I’d follow too…
Richie Rump: If you ever touch Entity Framework…
How can I build my dev databases faster?
Erik Darling: On that note, I’m going to mess this name up because I have never seen a name that looks like this – Terje – I’m sorry man, or woman, I don’t know how that one goes. If you want to phonetically give it to me in chat, I’ll say your name right. “Our development team complains about slow build times.” So basically, they have this server where they spin up a whole bunch of new databases to do new builds of the app, it sounds like. Sometimes the server has a whole bunch of old databases on it that need to get cleaned up, it bloats out a bunch of stuff, it’s not fun. See, in my head, I’m thinking why are you reusing a server for this? Why wouldn’t you just spin up a VM and have your new stuff all roll out to that VM? But maybe I’m missing something. Richie, what do you think?
Richie Rump: I’m not even sure exactly what the question is over all that.
Erik Darling: The question is, “Wouldn’t it be better to reuse some older databases and just run schema changes instead of building brand new databases each time?”
Richie Rump: Oh, boy, okay. In a build scenario, you typically would want to build it from scratch, okay, and you have all the data and you want to reload it, so just so you make sure that everything is kind of working and there’s no – you’re not worrying about deltas. So when I run the test, the test runs and it works every time, so you want to start from scratch. So yeah, if you’ve got a slow database machine, it could be slow. I’d look for other ways to make that a little bit faster. I feel your pain. Every time that I run a build or check in something into Git for ConstantCare, it automatically builds a new database for Postgres and then it loads data and it runs, you know, 500 different, maybe even more at this point, different SQL tests and it checks a whole bunch of stuff. So I feel your pain.
Ours don’t take as long as you. I think ours take about ten minutes, but it does take a little bit of time. And the more you add, the longer it’s going to take. So if you could start looking for different servers or different ways, maybe creating a VM or having a VM ready to go, I don’t know. There are different ways, but having build servers and stuff like that, that’s a whole different ballgame than what we’re normally used to here at Brent Ozar. You’re talking about building it each and every time. Now, are you creating the server every time is really the big question, because if you’re doing that, install, yeah, that’s going to take forever. But if the server’s up and running and you’re just creating databases, yeah – I’m not sure if I answered the question or not, but…
Erik Darling: Even if you can just spin up a VM that’s already imaged to have SQL Server on it configured a certain way and then just build stuff inside that VM, I think you’d be a lot better off than trying to just abuse this one poor machine over and over again.
Richie Rump: Which is – actually, we use a system called AppVeyor. It’s in the cloud. That’s exactly what it does. It has a standard image. We add some node packages. I think we actually uninstall a node version, we install a different node version and then Postgres is already there. So we’re not installing Postgres ourselves. It’s there and we’re just saying, create this database and create these tables, create these schemas, create all this other stuff and then run some tests.
Erik Darling: Alright, that is all the questions that we have for this week. Thanks, everyone for coming, hanging out, showing up. We will see you next week. Maybe Brent will even show up from a U-Haul, we don’t know yet.
Richie Rump: More from my parent’s house next week.
Erik Darling: Hopefully I’ll still be home, so I don’t know, get the whole thing. Alright, take care, y’all.