This week, Erik and Richie discuss TSQL classes, scalar functions, index reorganize vs rebuild, daily image backups on a fringe SQL server on VMWare, number of databases per AG, best practices for ETL operations, and more!
Enjoy the Podcast?
Office Hours Webcast – 2018-12-12
Erik Darling: So Marci asks, “Do you guys have a T-SQL advanced developer class that focuses on writing good SQL?” No, we don’t. I think the only person I know who has a class like that is Itzik Ben-Gan. I don’t know how hard it is to get in and do that stuff. We don’t have anything on writing it. We just have stuff on tuning it. So if you want stuff from scratch, you might learn something from mastering query tuning. There’s some pretty good advice on fixing queries in there, maybe not on how you should write them.
I think, at least for me, like once I got sort of deep enough into query tuning, there was enough carryover between, like, okay, now that I know how I would tune a query, I’m going to write this query in a way that I don’t have to tune it later. So I think once you learn how to fix a query then writing them from scratch, you kind of just naturally get better at that. So mastering query tuning might be a good fit for that, but if you’re really just looking for, like, how do I write query-type stuff, I would, instead of wasting a ton of money on training, I would probably just start with a few books. The same fellow who has the classes, Itzik Ben-Gan, has a few books on T-SQL querying that are just wonderful. One of them is a collaboration with a bunch of smart people like Adam Machanic and stuff.
So I would just get those books and not read them cover to cover, but as you’re doing something, check the contents and see if there’s a chapter or a couple of pages on what you’re actually doing. And then see if you can get any help from the book. Writing queries, you know, it’s a tough thing to teach people how to do because everyone has such different starting points with it and such different needs, depending on what their data is and what they’re looking for. There’s a big difference between writing a few small joins between tables and, like, writing really complex analysis-type queries.
Richie Rump: Right, I mean, I think really the key to writing good T-SQL is writing performant SQL. And once you’ve gotten into that, this is how the engine works and this is how certain queries react to the engine, that’s how you start writing good T-SQL; understanding that cursors are bad and SARGability and those type of things. That’s how you start writing good T-SQL. If you want to get into more advanced T-SQL stuff, I wouldn’t consider that good T-SQL, just that’s kind of authoring. And getting into things like windowing functions and stuff like that, that’s all something else. It’s not necessarily good T-SQL.
Erik Darling: It’s kind of just stuff you learn how to do along the way.
Erik Darling: Christopher asks, “For scalar functions, have you ever used the option returns null on null input? I know scalars are bad, but any possible examples of drawbacks to this?” I guess if you were depending on the output of that function to concatenate into another string, then you’re looking at an additional check to see if that’s null to replace it with something so it doesn’t make your whole string null. But, you know, scalar valued functions right now are just so bad for so many things that I wouldn’t worry about that. It’s like if, you know, I can’t even think of an upbeat analogy for this holiday season that would make sense for this. It’s not going to help you or hurt you either way. Just if you’re going to return a null value, possibly sometimes, then be careful about how you’re using that null value, because nulls have a lot of surprises in them.
Erik Darling: Theo says, “TDE is hacked. Is it still safe to use certificates?” Don’t ask me this question. I am not good at security. I had a bungle of a time just using one password to change some passwords this morning. It was awful. It was not a merry Christmas. Santa left me a lump of one password coal. You don’t want to trust me with this question. Sorry, Theo, ask someone else.
Erik Darling: Brian asks, “Do you know why an index reorg would be so much slower than rebuilds on the same index or similarly sized indexes? We noticed that maintenance was taking longer on the reorgs. As a test, I changed it to rebuild…” Yeah, so reorgs are single threaded. It’s like a little PAC-MAN that kind of goes along the leaf level of your index and sort of puts things in order. Rebuilds, if you’re on Enterprise Edition, have the ability to go parallel and all this other cool stuff. Reorgs also can do something called lob compaction, which can take a long time.
I used to run into this all the time when I was dealing with relativity indexes because there were some quite large tables in there and there were some quite large indexes and a whole bunch of lob columns, because that software is built by crazy people who let end users add columns to tables on a whim. And everyone says, I need the biggest possible column, I have no idea what’s going to show up here. So they would make all their columns nvarchar max. And then when reorgs run and they would do lob compaction, then boy howdy, that was a long time.
So the bigger your table is, the longer those reorgs take, especially if you’re doing lob compaction. I set sort of an internal threshold at around 50GB to 100GB where I’m saying screw the reorg, I’m just going to rebuild this thing if I need to. But even then, I’m not going to rebuild all that many indexes. I’m not going to do all that much index maintenance. I’m going to stick a link into the Webex howdy-doody think as soon as I find it. But I’m not that bright. I have to open up a web thing. What do they call that, goes on the internet and shows you things that people did?
Richie Rump: Facebook.
Erik Darling: Browser…
Richie Rump: The Dark Web.
Erik Darling: Yes, the Dark Web. Groupby.org… Because no one ever believes me when I say this stuff. I’m going to give you a link to Brent’s session at GroupBy about why defragmenting your indexes isn’t helping, and hopefully that will – nope, that didn’t copy or paste. This is a good day for me. Everything’s going well. Okay, so I’m going to stick that link in there. But yeah, that’s the bottom line on why rebuilds are generally faster. Reog is just, I don’t know, just kind of lightweight and wimpy. Any further thoughts on that, Richie? You don’t have to, it’s okay. There we go, I think Richie deserves some Christmas lights too.
Richie Rump: I created an index yesterday…
Erik Darling: Wow, did it make your deletes so much faster?
Richie Rump: No, it was a small table.
Erik Darling: Why did you create an index on a small table? Why are you wasting Brent’s disk money?
Richie Rump: Because it needed it.
Erik Darling: Okay, well fine then.
Erik Darling: Let’s see, I’m going to read this question first because it sounds suspicious to me. “One of our SQL admins wants to do a daily image backup on a fringe SQL Server on VMware, but I’m trying to convince him this isn’t sufficient since you want to be able to recover to at least the last hour. What are some concerns?” So I have no idea what kind of backups your sysadmin is looking to take. If it’s a snapshot of just the whole server and all the drives, it can be kind of a bummer. There are some backups – there are some VM backups, like Veeam, that will play nicely with your log backups or differential backups, so you could restore your database and then restore more backups on top of it to get you to a better point in time. But I would need more information on that before I started going off on any sort of tangent.
Erik Darling: Let’s see, Graham asks, “How many databases is an appropriate number in an AG? In my case, each database is less than 20.” So that depends a lot on how many CPUs you have and on how busy these databases are. With each database that you add in, you’re looking at adding in additional synchronization overhead, and that synchronization overhead takes CPU threads. And those CPU threads come from a kind of limited pool that SQL Server spins up based on how many CPUs you have. So really, what you want to do is think about just how much hardware you have on this server.
There’s a great post by – I’ll put the link into the chat questions – by a guy named Warwick Rudd about Availability Groups and things that you can just do massively, massively wrong with them. And it’s called What Not to do When Adding Databases to an Availability Group. And he kind of touches on a lot of this stuff. So what I would do is think, okay, if I have eight cores in this thing, I only have 576 or so worker threads that I can send out to run queries and do backups and keep these databases in sync. It’s generally not a good idea to hit THREADPOOL waits because what I’ve seen on a bunch of client servers is you have this AG and everything’s going fine, then THREADPOOL comes along and all of a sudden your error log starts filling up with messages about how the AG can’t synchronize because it couldn’t initialize a thread. So be really careful with the number of databases you stick in an AG. You can end up several creeks that probably smell like Richie’s daughter’s shoes.
Richie Rump: You don’t want that, that’s terrible.
Erik Darling: No, I seem to have lost my light. I was really liking the way this was looking, aside from the heat. Whoops, I put that in the wring window. Man, that’s hot.
Erik Darling: Okay, alright, let’s see, what’s a good new question? “Hi, what would be the best for ETL operations; a heaped table or the same table with a clustered index?” Usually, when I want to do ETL, when I’m loading data in, I don’t want any indexes. Indexes only slow me down. Usually, I just want to work with the heap and I just want to dump data in as fast as humanly possible. That’s one of the few really good things about heaps. One of the few upsides with heaps is you can just jam data in as really quickly and not have to worry about order or creating extra pages. Even a heap with a non-clustered index on top, you’re looking at dumping a bunch of data in the table and then having that and then having to dump a bunch of data in the index and it all just kind of stinks. It is not fun. So I would just generally stick with the heap on that.
Richie Rump: Yeah, I agree. And typically, I would do that, like have a heap, and then most likely I would need an index or two because I’m doing some sort of transformation on that data. But that would not be a clustered index, just because you’re constantly flowing that data through. As soon as you transform, you’re going to dump it somewhere else and then you’re going to drop that data at some point, so yeah, yippee.
Erik Darling: Let’s see here, “Have you all implemented group managed service accounts?” Absolutely not. 0% chance of that happening. Sorry, Rob, nope, I don’t do that. If you have questions about stuff like that, Denny Cherry has a book called Securing SQL Server. It’s got all sorts of good information about things that have to do with security that I am an idiot with, like one password. So please…
Erik Darling: Jim has a question, “We are getting insane page latch EX waits because of a multi-statement TVF that splits strings, but using string split directly or converting the function to an inline TVF is seven times slower.” So, Jim, this is a big question which involves code. There’s like code involved with a function that you have and other function options that you’ve tried. I would post this on dba.stackexchange.com because you could give lots of good examples. As far as why you have the page latch ex waits, it’s likely, if you have a multi-statement table valued function that’s using a table variable, you’re putting a bunch of stuff in there, that’s going to go to tempdb. That’s what the page latch ex waits are most likely going to be stemming from.
As far as a fix for that, if you’ve already tried the stuff that I would try offhand, you might want to try a different CLR version of split string. I know Adam Machanic wrote one a few years back. I don’t know where it is offhand and I think it would be detrimental for you to try to go search for it right now because all his stuff from SQLBlog has gone and moved and stuff. Post that on dba.stackexchange.com…
Richie Rump: I think he did a TechEd session about that CLR.
Erik Darling: He might have, but I don’t recall off the top of my head.
Richie Rump: Yeah, I think it’s on YouTube somewhere maybe.
Erik Darling: Probably. A lot of his stuff ends up on YouTube because he’s a very, very smart fella. Is that the right URL? That looks like the right URL. There you go, Jim, off to dba.stackexchange.com with you. Enjoy that. It will be a fun time. Everyone loves dba.stackexchange.com – it’s full of so many smart people. You might even get an answer from Paul White. Good times.
Erik Darling: Julie asks, “What controls the subject line for failed jobs set to notify the operator? I have two servers with the same job but the subject line is different.” God, it’s been so long since I had to look at that. That is an interesting piece of trivia. You know, I don’t know offhand; weird. I couldn’t tell you.
Richie Rump: Wow, I think it’s safe to say you would not be certified, sir.
Erik Darling: No, I wouldn’t.
Richie Rump: You failed the certification exam because you missed that bit of trivia.
Erik Darling: I’ll live. I’ll find a way to survive without that. This is why I wish Tara was here, because she would know that because she still probably has agent jobs, like, set alarms for her, send her messages while she’s on hikes, I don’t know. I don’t know offhand.
Erik Darling: Let’s see, Chris asks, “Any experience with SQL Server centralized management server to push policies…” Good heavens, push policies? No, no I am sorry that I sent you the link to the think about Availability Groups because that was a mis-copy and paste on my part, that went to the wrong line. But no, generally I have – something else that I just haven’t done any work with because that’s not the type of DBA that I care to be at work. Apologies for not knowing more about using a central management server to push policies out. That brings us to the very end of our questions.
Richie Rump: Really, no service broker questions for you? Because you’re a big fan of that service broker stuff…
Erik Darling: We don’t have anything on service broker or replication and that’s fine because these lights are about to make me pass out, so I’m going to…
Richie Rump: Hey, if you’re ever cold in New York over the winter, you now know what to do.
Erik Darling: Yeah, you know what, I’m going to start giving these lights out to the homeless because I’m frying. I think I’m going to pass out if I don’t take these off. So have a very merry Christmas, everyone. Happy holidays. I hope to see you as long as these lights haven’t killed me. Adios.