This week, Brent, Erik, Tara, and Richie to discuss cloud server disaster recovery, Availability Groups troubleshooting, SQL Server deadlocks, slow tempdb latency, migrating to Azure, SQL Server 2017 gotchas, how to set up your dev environment when looking for a job, best allocation unit size for disks hosting data files and good replacement for linked servers.
If you prefer to listen to the audio:
Enjoy the Podcast?
Office Hours – 4-25-18
Does Azure have a good hybrid DR option?
Brent Ozar: Alright, let’s see here. Anon says, “Do y’all know if Azure has a good hybrid DR option? We’re rolling onto SQL 2014 on-premises and I was just told we’re going to need disaster recovery.” I like that, we were told we were going to need disaster recovery. “Apparently, we don’t have any and it’s just me, so I thought maybe Azure might have something better than physical servers.”
Erik Darling: I love just picturing this conversation where it’s like, this is going to be a disaster…
Brent Ozar: We need to be able to recover from this disaster. I don’t know about Azure – we actually wrote whitepapers on a technique that works across all cloud providers. I’ll point way over to the – oh, I got the wrap around thing… So Tara, about that whitepaper that you wrote for Google…
Tara Kizer: It’s a cheap solution, but it is a very valid solution. So sending your backups to the cloud into a storage bucket and then when the time comes, spinning up a VM and getting your system restored over there. Just tack the SQL Server component and you really can get your system up and running fairly fast. But just make sure you understand what you need to do to get your application server, web servers. Make sure you’re setting your source code to the cloud also because having your database backups is not enough to get your site back up and running.
Erik Darling: Yeah, and if your company already has, like, pretty tightly defined RPO and RTO goals, you need to make sure that there’s a separate discussion around what they’re going to be for DR, especially if you use the cheaper DR option where you’re just sending the backups up because oftentimes it’s going to be outside what your current windows are. It’s just not going to be as immediate or as fast as manual work involved to bring stuff online.
Tara Kizer: And you can still use the whitepaper, you know, if your RPO and RTO is lower, just make sure you’re not only sending your backups to a storage bucket. Also, have the restores happening; the VM set up and the restores happening.
My app server has these errors in the event log…
Brent Ozar: Rakesh says, “I’ve got multi-subnet Always On Availability Group. If I have JDBC and it goes to connect to the listener, is it going to attempt to get all of the IPs?” Hold on a sec – let me finish answering. He also says, “My application is working fine but the log file is filled with errors.” That’s like a three-part question. Okay, let me sum up and rephrase. So he’s using Always On Availability Groups and he’s using the option where it will automatically try to connect to all of the servers at once. Is it a problem that logins are failing on the servers that aren’t the primary?
Tara Kizer: I’m confused by the question because he’s asking if it will connect to all IPs. And that’s going to be dependent upon if your database driver supports multi-subnet failover. It says you’re using JDBC, but it doesn’t provide the version number. JDBC 4.0 does offer the multi-subnet failover equals true option. So make sure you’re on JDBC 4.0. And I would hope you are, since that’s been around for a very long time; you shouldn’t be on older stuff. And then your connection string needs to include multi-submit failover equals true into it. I wonder if you’re getting 50% success for the – is it just the secondary servers that have the errors?
Brent Ozar: He didn’t say, but I would bet it is.
Tara Kizer: Yeah, because I don’t know that the errors on the secondary – I guess that would be trying to connect to it if you don’t have multi failover subnet in there. So check your driver version and check your connection string. Multi-subnet failover equals false is the default. So you have to turn it on. And Microsoft actually recommends that you go ahead and enable this feature even if you aren’t using a multi-subnet configuration to indicate there’s some performance benefits just using Availability Groups in a single subnet.
Brent Ozar: Oh, this is interesting. He follows up with, “We’re using JDBC 6.2…” Okay, that’s good. “We’re getting 100% success, but the errors are in the application log, indicating connection failure.” The application event log in the SQL Server or the app servers? If it’s the app servers, I don’t think we care. We’re database administrators.
Tara Kizer: I don’t even have access to those servers.
Brent Ozar: Yeah, Rakesh says it’s in the application server. Yeah, then don’t worry about it. As far as I’m concerned, that’s an application problem.
Tara Kizer: Filter them out.
Where can I learn about deadlocks?
Brent Ozar: Teschal asks, “Where can I get good information on SQL Server deadlocks?”
Tara Kizer: Good information…
Erik Darling: Good information is tough.
Tara Kizer: Jonathan Keheyias would be who I’d think of. He loves deadlock troubleshooting. He might be the only person in the entire world that loves it. They’re frustrating. I’m just like, have you enabled RCSI? Because that’s where I’m stuck.
Brent Ozar: It is hard. Erik actually wrote a tool around it too.
Erik Darling: Oh god, yeah. I was drunk on a plane with Brent, so if there are any errors, don’t take them personally. It’s in the First Responder Kit, so if you head over to firstresponderkit.org there’s a stored procedure in there called sp_BlitzLock, which is I guess a little bit misleading because it’s actually to look at deadlocks. You can have it look at the system health extended events session. It only works on 2012 plus because the deadlock definition changed a whole bunch in the XML and documents, so I’m just kind of crap out of luck on that. but it will look at either the system health extended events session, or if you have a custom extended events session that capture deadlocks, you can pass that in as one of the parameter variables or whatever. It will give you back all sorts of cool parsed out information about which queries did what and when and all sorts of stuff.
Brent Ozar: It’s really awesome. I even like it just for regular locks, not even deadlocks because, half of the time, there’s some deadlocking involved, it’s just not big and it gives you a place to go start. Since Kehayias is really hard to spell, I’ve got it up on the screen there as well. He’s got several posts about it. The Simple Talk one is fantastic.
Richie Rump: You’re IEing again.
Brent Ozar: I am. I’ve got to put Chrome on this thing. It’s so terrible. It’s so bad.
Erik Darling: You got to use IE no matter what because it’s the browser you don’t use. There’s no history in it. There’s no bookmarks. It’s like the only public-facing safe browser you can do is that and you can only Bing things. Otherwise, no, avoid chrome in public.
Tara Kizer: I have a little bit of a funny story, since we don’t have a whole lot of questions in the queue anyway. So in February, I was trying to get a permit to a very popular place and we went to the website a couple of minutes before 7AM when it was supposed to go live. And in Chrome, refresh, control F5, and it would not pull up the reserve button. So you know what, I’d probably a caching issue, even though I’d tried all the tricks. So I pulled up Edge, you know, Microsoft Edge was on this computer and sure enough, it worked over there so I was able to complete my permit and get this extremely hard permit because I hadn’t gone to the website yet in Edge. A lot of people encountered it and it would work from their phone but not their desktop.
Brent Ozar: See, I would have thought have using my phone before I would have thought of using Edge.
Tara Kizer: I was panicking too.
Erik Darling: That happens to me a lot with Microsoft sites. Like, I’ll open them in chrome and it will say like error, connection refused or like connection broken, and then I open it in Edge and it comes right up. I’m like, okay come on, guys.
Tara Kizer: See, I don’t even think about it for that; I just don’t get to go to that page today, I guess.
Brent Ozar: I get it with Books Online and Tacknet stuff all the time; you have to log in to see this. Screw you.
Tara Kizer: Wow, I’ve never encountered that, huh.
I added TempDB files, and my latency went down…
Brent Ozar: Let’s see, Adrian says – he has a follow up on last week’s slow tempdb latency. He had an eight virtual core server. It had eight data files; nine seconds latency on each tempdb data file. Now that he only has one tempdb data file, the latency is only 12 milliseconds; no more sad trombone. I have a hunch that it’s not related to the number of files. I have a hunch that something else is going on. It’s possible, I just think…
Erik Darling: It seems like the opposite of the way things usually go.
Brent Ozar: It’s possible. It is possible. There’s so many…
Tara Kizer: [crosstalk] they had to restart SQL Server, so if I wonder if something else happened there.
Brent Ozar: yeah, that’s what I’m thinking too, like patching or just a parameter sniffing issue.
My company wants to migrate from SQL Server to CosmosDB…
Brent Ozar: Phil says, “My company wants to migrate SQL Server up to…” What the…
Erik Darling: No.
Brent Ozar: God bless you, Phil, this is a good question. “My company wants to migrate some SQL Server databases to Azure Cosmos DB in order to go faster. They use the XML data type, so they see it as a silver bullet. Are there any metrics you know of to evaluate RDBMS to DocumentDB’s suitability?
Erik Darling: How is Richie still standing?
Tara Kizer: Just shaking his head.
Richie Rump: I mean, if you’re using XML as a document and you’re storing it in a relational database, you’re missing the point, right. So you need to shred all that out and put that into a relational manner. You’re using a relational database as a document data store. So you’re going to get bad performance from that even though you can get XML indexes and all the other crazy stuff on top of it. You’re going to get better performance from a document database when you’re using documents. That’s just kind of the way it is. Now, if there’s any metrics or anything I know; no. You’ve just got to put a test set and say here’s one and here’s the other and then go for it, but then the document database is always going to be faster for documents.
Brent Ozar: The other thing is, Azure Cosmos DB has nothing that SQL Server has. It doesn’t have any of the fancy-pants stuff that SQL Server does. Transaction logging – you get snapshot point in time backups like once every four hours and that’s it.
The thing that I would usually ask companies too is, okay, so you’re going to rewrite the data layer, right, because you now have a totally different layer going over to Cosmos DB. Of course, it’s going to be faster because you’re not going to be so drunk this time when you go and rewrite it. You’re like, oh we never should have built it that way and we never should have built this, and it’s going to be faster when you rewrite it.
Richie Rump: Yeah, you know, or you could take a look at it and say do I have relational data? Am I storing it in XML because somebody, like Brent said, drunk that day and whatnot and we just stored it in XML because we’re lazy or somebody was lazy? But is this relational data – I think that question needs to be asked first before you go and ask what’s my database going to be. Well, what kind of data do I have? Then you can answer the database question.
Have you used @@dbts?
Brent Ozar: Pablo is asking us an interesting one, “Have any of y’all used @@DBTS, the timestamp?”
Tara Kizer: No – I’m like, what is that?
Brent Ozar: No, I’ve never used this either. None of us – Pablo, we won’t be able to give you a good answer, but what I would do is ask it on dba.stackexchange.com. I’ve never seen that either, so…
Erik Darling: Just to follow up a little bit on Phil’s thing, using XML in SQL Server religiously is like throwing your database the hardest knuckleball you can possibly throw it. Whatever’s in that document is like kind of unknown. Like, getting the certain paths and the tributes and evaluating them is just a nightmare. This is nothing that the optimizer is good at or was built for and there’s no special rules built in to, like, make XML better for that. So if you’re querying XML in SQL Server without XML indexes or without parsing it out first, yeah, it’s going to suck all day long.
Brent Ozar: Phil follows up with, “Yes, XML does suck.”
Erik Darling: JSON’s not any better.
Richie Rump: It’s not XML; it’s just the data format. I mean, SQL Server just wasn’t designed for XML. It’s a CSV – it’s just any – it’s a data format. SQL Server was designed for relational data. You put in rows and columns and then you can do other things and make it super fast. But when you don’t do that and you just throw data at a row and a column and say here go and search for something, it doesn’t like it. And it doesn’t matter if it’s XML or JSON; it’s using the same things behind the scenes.
Brent Ozar: Files, all kinds of stuff.
Erik Darling: XQuery is hard.
Brent Ozar: Says the guy who writes XQuery.
Erik Darling: I know, it’s hard. It’s not fun.
Richie Rump: You’re insane.
Any new gotchas with 2017?
Brent Ozar: Steve says, “I’m going to go install a new instance of SQL Server 2017 and it’s just a dev instance. Are there any new gotchas with 2017?”
Tara Kizer: What version is he coming from? 2012 or older? Because the new gotcha from 2014 up is the new cardinality estimator, which can cause some performance degradation for some queries. So just be aware of that if you’re coming from an older version or you are using a compatibility level that is 2012 or older.
Is it okay to force legacy cardinality estimation?
Brent Ozar Which segues beautifully into – Jordan asks, “I ran into the new 2016 cardinality estimator issue with a script. I have several tables joined with two views.” We’re off to a good start, “It took less than two minutes on 2008 R2.” You were happy with that? “Less than two minutes on 2008 R2 and over 15 minutes on 2016. If I remove either view in the join, it goes back to two minutes. My solution right now is to use a forced legacy cardinality estimation. Do you have any concerns?
Erik Darling: Yeah, but not with the hint. I’m concerned with everything you’re doing that’s leading up to the hint. I hear joining to views and in my head I immediately see views within views within views that are joined to other views. And I know that the problem is in that process – it’s like the code that Brent always talks about that’s like 90 years old that no one wants to touch because if you move a comma it breaks. So I feel for you on that, but my concern is not with you hinting the old cardinality estimator.
Brent Ozar: Jordan followed up and he said, “You know, if we’re honest, the real problem is that it’s a select and a cursor, which is the real tyrant.”
Erik Darling: Is there a trigger and a function? What kind of sadistic bingo are you playing?
Tara Kizer: Non SARGable process, yes.
Richie Rump: Please tell me the database is less than 1GB.
Erik Darling: By the way, it’s in AdventureWorks.
Brent Ozar: He gives an industry, but I’m not going to repeat that on here just in case – because the thing’s recorded, of course, and put in publically, but yes. So does it give us concern? If it makes it go fast, obviously you specialize in duct taping things together in order to get past the next hump so this should be perfectly comfortable with you. It’s normal.
I have this 8-socket, 1-core-per-socket VM…
Brent Ozar: Let’s see, Robert says, “I have a VM configured with eight sockets and one core per socket.”
Erik Darling: Boo…
Tara Kizer: Well, that’s unusual.
Brent Ozar: And why boo?
Tara Kizer: You know, I’m not going to answer that, you know I won’t.
Erik Darling: So, why boo is – I mean, just immediately, whenever I see anyone who has done this, it’s always on Standard Edition and they always have cores and or memory offline. I don’t have any particular gripe with the setup outside of that. Like, as long as it fits within your licensing and everything’s online and available, I kind of don’t care. But immediately I jump to, you know, you’re on Standard Edition and two of your cores are completely offline and maybe memory – four of them are offline, sorry and maybe even memory offline as well; I don’t know.
Tara Kizer: Run sp_Blitz. At the bottom of sp_Blitz, there’s a section that will let you know if some of them are offline or not.
Richie Rump: ConstantCare would do the same thing.
Brent Ozar: It sure would. Richie and I have caught a lot of people with cores and memory offline and that’s one of my favorite things to email people. Check this out; you server’s about to go twice as fast as soon as you – it’s incredible.
Tara Kizer: It’s in the hardware numa config section in sp_Blitz, just to give a hint.
What’s the best way to set up a lab?
Erik Darling: Niraj says, “I’m not working and I’m looking for a job. What’s the best way to set up my environment to practice until I get a job?” The only answer I have for that is set up your environment for the job you want to get. If you want to do, like, perf tuning stuff, set up whatever version of SQL Server Developer Edition you have available to you. 2016 and 2017 are totally free. Download a great big honking database like Stack Overflow and do your practice and setup stuff in there. If you’re looking for SSIS or SSRS work or just like plain old DBA work then install whatever tools you need.
If you’re looking to do, like, infrastructure or prod stuff, you know, use Hyper-V because it’s free as all get-out and you can create VMs, you know, primary domain controllers, make AGs, log shipping, mirroring, whatever. Set up your environment for the job you want. That’s the best advice I have.
Brent Ozar: I like that. There’s also the Microsoft Virtual Labs too. Open up Internet Explorer – brace yourselves, everybody. So Microsoft marketing wants to get you to get the latest certifications and play around with the new technology. So they do their own labs online that you can go through – I’m not going through all those. So there’s a bunch of self-paced labs on here that you can go through…
Erik Darling: What has Stretch Database come up first?
Brent Ozar: Marketing is like, come on, we got to get this pig to fly.
Erik Darling: Use it, someone…
Brent Ozar: But the thing that I adore about these are they’re all free and you don’t have to use them to do what the labs want you to do. They’re just VMs, so you could use them for anything you want. You could set up database mirroring, you could set up log shipping. There is a gotcha…
Erik Darling: Mine Bitcoin…
Brent Ozar: The Gotcha is that there’s no copy paste and there kind of silo-ed off from an internet perspective. So it makes it really hard to go download the stuff you want. But as long as what you want is included in the boxed product, it’s easy enough to play around inside there and totally free. I want to say they go away after two hours or something like that…
Erik Darling: Yeah, they have a time limit on them.
Any overhead in cross-database stored procs?
Brent Ozar: Michael asks, “Is there any overhead from calling a procedure in one database from another database? Like, if I’m in AdventureWorks and I go call something in Northwind?”
Erik Darling: Like, performance like finding the procedure perspective? It’s minimal.
Tara Kizer: As long as it’s not on another server using linked servers. It sounds like it’s on the same instance.
Brent Ozar: Then he says, “Is there any significant overhead on using one procedure to call another? We’re keeping legacy procedures in an old location while reorganizing where the current prod procs live.”
Erik Darling: No, but it can make your plan cache really confusing because whatever stored procedure calls substored procedures will get all of the resource use attributed to it. So if you have a stored procedure that calls like three other procs and you run the two – let’s say, I don’t know, let’s go crazy, sp_BlitzCache, then you see that calling stored procedure way up at the top of the list, just keep in mind that all three of those – whatever all three of those stored procs do is going to be attributed to that calling proc. So it can make things a little confusing, but there’s generally nothing crazy insane that’s going to go wrong with it.
What’s the best NTFS allocation unit size?
Brent Ozar: And then Pablo asks, “What would be the best allocation unit size for my disks where I host my data files when I format my drives?”
Erik Darling: We’re here from 2003, I guess [crosstalk] cursors and… Whatever your vendor documentation says. If you’re using local disks then – I can’t even remember the last time I even bothered thinking about that. like, when I first started here and people were still using SQL Server 2003, we would, you know, look at disk allocation and…
Brent Ozar: Windows Server…
Erik Darling: Windows Server 2003, right. But yeah, we used to look at that way back then. There were some, like, boss command, but I can’t remember – most people on SANs or whatever else, it’s whatever the vendor docs say.
Brent Ozar: Because half the time, they’re using a different format under the covers anyway. Like, NetApp, it’s all 4K underneath.
What is a good replacement for linked servers?
Brent Ozar: Oh, Kevin. Kevin asks, “So, what is a good replacement for linked servers?”
Erik Darling: Who’s that bearded guy? That guy with the beard that was supposed to be here that has a good replacement for linked servers; what’s his name?
Brent Ozar: Oh, yes…
Erik Darling: What’s that guy?
Brent Ozar: Andy Leonard.
Erik Darling: Andy Leonard, that’s right. What’s that thing he uses? What’s that thing [crosstalk 0:20:21.2]?
Richie Rump: Reporting services or something like that?
Erik Darling: Almost…
Richie Rump: That was it; DTS packages.
Erik Darling: [crosstalk] SSIS. SSIS is a great replacement for linked servers because there’s no linking because it’s just a thing that sits there and moves data around.
Tara Kizer: I mean, it also depends on what the linked servers are doing. If you’re just using it to grab data bulk type stuff, then SSIS. But if it’s not a bulk type thing then your application should be smart enough to go to another server to grab data and then you join the data together at the application side.
Richie Rump: Depends how much data we’re talking about here and how many transactions are involved and are we using an ORM to get all this data?
Erik Darling: Or are we?
Brent Ozar: Oh my goodness, we…
Erik Darling: Can you edit that out later? Pavel, can you blank that section? Put a do-over on that.
Brent Ozar: Ah, Kevin says, “It’s researchers pulling data into their working databases.”
Erik Darling” Oh, just use Access. I’m just kidding.
Brent Ozar: That’s different. Yeah, in that case, SSIS is magical. Like, I’d be like totally, teach them how to start transforming the data, put it into to different style. SSIS is great for that. I would just make sure that they – and this is not a plug for Andy’s class, but it’s just a plug for any kind of education. Make sure that they know vaguely what they’re doing, otherwise they’re going to hook SSIS up to production and suck all the tables down every time. They’re not going to do any change detection; they’re just going to try and pull all the data down every time.
Ah, that’s why the other TempDB files aren’t getting used
Brent Ozar: Adrian says – uh-oh, this is a follow up on his tempdb, I think. He says, “I’m using SentryOne to look at data patterns as well and we can see that all the I/Os only flow into the primary data file. Now, I will do a restart and revert if there’s anything insightful I can find.”
Tara Kizer: He’s the one mentioned the nine-second waits and – he said he hasn’t even restarted the server, but they don’t even take effect until you restart the SQL Servers at least. So I’m not sure what’s going on.
Erik Darling: Yeah, he mentioned something earlier about 1MB auto-growths and stuff too. So I wonder if you don’t have instant file initialization turned on and your disks are, like, not able to keep up with the growths and everything, like lots of little growths.
Brent Ozar: Or the files are all different sizes. Like you added a bunch more files and they’re all really tiny. They all need to be exactly the same size. SQL Server will do proportional fill and work really heavily on the busy one.
Erik Darling: Yeah, whatever the biggest file is will kind of black hole all the stuff going on in there.
Brent Ozar: Perfect, well that’s it for this week. Thanks, everybody, for hanging out and we will see y’all next week on Office Hours; adios, everybody.