This week, Erik, Tara, and Richie discuss tempdb etiquette for developers, elastic search vs full-text search, server CPU spiking issues, source control in a database project, querying transaction logs, and more.
Enjoy the Podcast?
Office Hours Webcast – 2017-02-22
Erik Darling: This is a question about indexes.
[Tara’s phone rings] Richie Rump: Shhh.
Erik Darling: Who’s on call?
Tara Kizer: Sorry, it’s now on vibrate.
Erik Darling: Tara is on call for her secret DBA job that she doesn’t want to tell us about. Her real job.
Tara Kizer: This one doesn’t pay enough.
What happens if I add clustering keys as index includes?
Erik Darling: Got to get that second paycheck in. All right, “In the event that I create a non-clustered index that needs a clustered index column, is it a problem to go ahead and add it even though I pick it up as a secret column? I want to add the clustered index for housekeeping purposes so it’s evident in the definition but I’m not sure if it causes any issues.”
Tara Kizer: I’ve wondered that. It doesn’t make a difference. It’s going to get added regardless. I don’t add them to the non-clustered index. Who cares if they’re secret as long as DBAs and developers know that that’s what’s happening. You don’t need it in there for explanation reasons.
Richie Rump: But that’s where you go wrong.
Tara Kizer: Okay.
Richie Rump: Developers don’t understand that it’s there.
Tara Kizer: That’s true. Maybe they don’t even need to know that though. Do developers need to know about the secret columns?
Richie Rump: If they’re running a lot of queries it would help them to understand, “Hey, use the index, follow the indexes,” and all that other stuff. I never used to put them in there because I understood that they were implicitly. But then someone called Kendra Little, she brought up a good point, that sometimes clustered indexes change and things like that. So if you don’t put it in there, whatever column was there, it’s going to go away and then everything just may not work the same way. So go ahead and explicitly put them in there so that way in case your clustered index changes, it’s always there.
Erik Darling: There’s also another interesting part to that where when you pick up the clustered index column, is it implicitly as involved in the non-clustered index. It’s not part of the key of the index. I believe it’s just like sitting at the leaf level somewhere. It’s not in the key. So if you need that clustered index key column to do something additional within the index, and I’d probably want to have it as a key column, right? So like if you had to like join or filter on it or do something else, I would want that to be right up in the key of the index. There’s no penalty for having it there. It’s not going to be in both places but I think that especially if my query is using the clustered index key column I would want to have it as part of the non-clustered index definition. So it depends on the queries.
Richie Rump: I put them in by default now. That’s just my answer.
Erik Darling: Wow. Look at you, big boy.
Richie Rump: Yeah, man.
Erik Darling: Congratulations.
Richie Rump: Yeah, because I can’t use to design a database, that can’t happen.
Erik Darling: No. You can ask me to fix it, I won’t design it. I can fix all sorts of stuff; I just can’t design them. I’m like a good janitor, I’m not a good construction worker.
Should I change my application timeouts?
Erik Darling: Let’s go to the next question. “A user has been complaining about query timeouts which when reviewing our monitoring tool looks like high concurrent other user activity querying a specific table. I recommended for them to increase the timeout setting on their app’s end. Anything on my DBA end that can also help?” I don’t think so. Make your server better so the queries run faster.
Tara Kizer: Figure out what the root cause is. J.H. also asks, “Isn’t it true that the timeout setting is unlimited on SQL Server side?” That part is true but I’m not sure that your reaction should be to increase the application’s timeout value. Maybe there are some queries that need a longer time at value and you can set that specific one to have a higher one. But as far as the connection stream goes, I would leave it and then from a DBA perspective figure out if you need to add any indexes, you need to fine tune the query. So you asked on your DBA end, is there anything that can help, take a look at the execution plan and see where that leads you.
Erik Darling: Sounds good to me.
What’s fill factor?
Erik Darling: Michella asks, “Can you explain fill factor?” Yes. Tara loves fill factor.
Tara Kizer: Fill factor drives me crazy, for the clients that have lowered the fill factor and the column that’s in the index is just the identity column. I just don’t see why this is being one because there is no reason to have anything but 100 percent fill factor for identity columns. Fill factor is when you have free space on a page, if it’s lower than 100 percent or 0, 100 and 0 are the same thing. This allows you to be able to do inserts into the page if there’s free space in there without having to do a page split. A page split is when you have a data page that is full and you need to insert a row in between and you’re inserting a row in between because of the clustered index that’s sorted by the key. So it has to do a page split. It creates another page and it keeps 50 percent of the rows on one page and 50 percent of the rows on the second page. Then it can do the insert into the proper order. So lowering the fill factor can reduce page splits but page splits don’t happen when you have an identity column. Then when you rebuild your indexes, you’re going to get back the fill factor. After the page split happens, eventually that page might fill up as well and then you rebuild the indexes later at some point if you’re doing that kind of maintenance and you get that free space back.
Erik Darling: Fill factor isn’t preserved when you insert or do anything else to your indexes. Fill factor is only preserved after you rebuild an index, which is something that not many people know about.
Tara Kizer: I don’t like seeing servers that have set the fill factor to a lower number. I think that if you are going to lower the fill factor, it should be on an index-by-index basis. Never set a lower than 100 at the—is it at the server level or database level? I can’t remember which one it is. But don’t lower the server-level or database-level. Do it at the index level. Really, one of the only times this should happen is on GUIDs. Don’t use GUIDs.
Erik Darling: Oh, GUIDs.
Richie Rump: Well…
Erik Darling: What I always say, it’s easy to run out of ints, it’s hard to run out of GUIDs. So if you’re concerned about having—what the bigint max, Richie?
Tara Kizer: It’s pretty gigantic.
Erik Darling: [Inaudible] followed by every other number repeated twice.
Tara Kizer: I don’t know that anyone needs that. I think that GUIDs should be used when you have to keep the uniqueness across servers, different types of servers such as say a [inaudible] or something else. Even just SQL Server instances, you need to keep that row unique amongst different servers and not just this one SQL Server instance. I don’t think that using the GUID is a good choice when it’s just on that server.
Richie Rump: So, we have a new architectures coming out as we do in the developer space. Some of those need [inaudible] GUIDs going across processes, especially in microservices, right? So as a microservice, I need an ID that doesn’t overwrite something else. So that kind of follows from service to service to service and makes those hops. So it’s tough not to have kind of a GUID in there, maybe it’s not to your clustered index but probably going to need to be in the database somewhere.
Erik Darling: Yeah, that’s a good point. A lot of times when you have the GUID that you need for that uniqueness, you don’t have to have it as the clustered index. You can have a non-clustered primary key as the GUID just to have the uniqueness there and have an index on it. But you can have the clustered index on a totally append only surrogate key like an ID column or based on a sequence next value hanging out in there. Anyway, let’s move on.
Richie Rump: I did that a couple weeks ago. Look at that. Look at me. I’m amazing.
Erik Darling: No you didn’t. Stop lying. Stop using my ideas. Pretending, using my ideas and that you’re good at something.
Richie Rump: It’s not going to happen, man. Come on.
Can I use mirroring to reduce downtime for database upgrades?
Erik Darling: Christopher has a question. He’s looking to replace his SQL 2014 Enterprise server with a shiny new 2016 server. He wants to know if he can use mirroring between 2014 and 2016 to minimize downtime.
Tara Kizer: He can, he just can’t go back. You can, if you ever do failover, you can’t go back to the 2014 instance using database mirroring’s failover. I’ve certainly done it—or I should say, I haven’t used database mirroring for my upgrades. I’ve just done custom log shipping just to get the full backup, restore, maybe apply a differential, get the log chain in place. Then at the time of the maintenance window do the final log backup and then restore that final one on 2016. So I like that method, but yeah, database mirroring works as well.
Erik Darling: I’ve used mirroring and log shipping for migrations between versions. I don’t have a favorite in either case. Mirroring is nice because you can set it to asynchronous and not have to babysit anything.
Tara Kizer: That’s true.
Erik Darling: Then when you’re ready, set it to synchronous and do your manual pushover. Log shipping is nice because it does some of the work for you. Like availability groups—for some reason, I don’t know why mirroring doesn’t do it as part of the wizard. Mirroring doesn’t initialize anything for you. Like with log shipping, it’s like, do you want us to initialize this? You’re like, “Yes. I want you to take that backup and do a restore for me. Because I’m lazy.” Mirroring doesn’t for some reason. Mirroring is just like, “Did you do it, dummy? Hey, dummy, did you do it?” So, I like log shipping just for that. Plus, log shipping is so old and bulletproof.
Tara Kizer: You trust it, yeah.
Erik Darling: So I don’t know. I don’t have a favorite between the two aside from a few setup quirks. But either way, like Tara said, it is a one-way migration. So do your failover, don’t let users in immediately. Do some smoke tests. Have like some [inaudible] devs ready to try this stuff out and make sure all the connection strings and all that garbage works. Then when you’re ready, then release the Kraken. But until then, because it’s much less painful to have to just like reinitialize mirroring or log shipping to the server again if the smoke test doesn’t work than it is to try to migrate new user data back, and if there’s identity columns and parent child tables, it’s terrible. Anyway.
What do you use for I/O testing?
Erik Darling: Another question from Michella, “Do you have a go-to script for I/O tests?” No. There are tools like CrystalDiskMark and diskspd which are much better for I/O tests. If you’re looking for just a SQL DMV one, sp_BlitzFirst, if you run it, if you download sp_BlitzFirst from FirstResponderKit.org, you can run sp_BlitzFirst either since startup, then it will look at some of SQL’s internal DMVs to tell you about I/O stuff. Or, you can run if for a sample to tell you about I/O stuff. But if you just want to do a pure test of your disk, CrystalDiskMark and diskspd, there are blog posts on our site about both of them. That’s where I usually go when I need to test a disk. When I wanted to test my brand spanking new fancy 1 terabyte M2, I used CrystalDiskMark. I made clients feel bad unfairly because I was like, “Look how fast this is compared to how you do things.”
Richie Rump: [inaudible] under your desk?
Erik Darling: Yes. When you say it like that…
How long will an index operation take?
Erik Darling: Reece has a question. “Is there a way to get a rough estimate of how long an index operation will take to complete?”
Tara Kizer: Test it. Store the database on a test server and test it.
Erik Darling: Yeah, and you know, when it’s in flight, not really. It’s unfortunate. Not really.
Richie Rump: Not like it would be right, right? You can see that little progress bar and then it stops at 96 and you’re there for two days.
Do you recommend using plan guides?
Erik Darling: Nestor asks, “Do you guys ever recommend using plan guides? Do you ever troubleshoot performance issues with plan guides?”
Tara Kizer: I do. I’ve implemented them in production out of necessity. At a company we were migrating from SQL Server 2005 to 2012. As part of that upgrade, we changed the clustered index of the core table of this critical system. That’s because the clustered index was pretty wide and that’s what we needed on SQL 2000. We wanted to go to say—it wasn’t an identity column but something else. So as part of the upgrade, we did that. Well, once we let the production users in after the upgrade suddenly performance was really bad. We realized that the execution plans weren’t picking this great index. So we converted that clustered index to non-clustered so it was still there but now the execution plans, they were not selecting that index. We proved that via index hints that that index was very helpful for a lot of queries. So rather than adding index hints into the stored procedure code, which this was all of our own custom codes so we could have added the hints directly into the stored procedures, we decided to use plan guides. So we tied a plan guide to the stored procedures that told it to go ahead and use this index or sometimes we did the optimize for, just depended upon what was needed. So I’ve definitely used them. I think only one time I talked about it with a client. That was just because we were trying to override Dynamics AX’s choice of using the optimize for unknown. When you don’t have ownership of the code, the stored procedure code, you shouldn’t be adding index hints in there. But attaching a plan guide might be something that you can do. But just be warned, when you do put a plan guide on vendor code that when the vendor application gets upgraded, you’re going to want to drop those plan guides because otherwise they won’t be able to drop and recreate stored procedure. They’ll get an error.
Erik Darling: Yep. What she said. She’s smarter than me.
Tara Kizer: Well this was when I got in the Microsoft engineer that we had flown in for this major upgrade that we were doing just in case we ran into problems. After the upgrade he did training with us, so we had him onsite and looking at our production issues. He’s the reason why we put plan guides in place and after that I knew what to do.
Richie Rump: I think the key to your entire statement there is, “When we had to.” It’s not the like the first thing, it’s not the second thing. This is like the very last thing you do is put a plan guide on. There’s nothing else. That’s when we do the plan guide.
Erik Darling: True that.
Should I use a CTE or a temp table?
Erik Darling: Brian has a question about—well, it’s not really about temp tables, it’s more about CTEs. So last week we talked about using temp tables to avoid mega join queries. Brian is asking if it would be better to use a common table expression rather than a temp table, and if not, why. Does anyone have an opinion on this matter?
Richie Rump: Sometimes I think CTEs make things a little harder to read than easier to read.
Tara Kizer: They certainly do.
Richie Rump: So, there you go. That’s my answer.
Tara Kizer: I like using temp tables when you’ve got a large query that’s getting the compilation time out, so then breaking apart a lot of those joins and putting that stuff into temp tables and then just joining to the temp table can avoid your compilation timeout. When you get a compilation timeout, you might not have gotten the best execution plan for your query. You got just whatever was the best so far.
Richie Rump: Yeah, and I could put indexes on those temp tables too if I really need them. So that adds something else.
Erik Darling: It’s true. When you start joining a lot of tables together, you may have optimal indexes for some joins but not for all of them. If your query from the final product of those requires different indexes, you don’t want to have 100 indexes on your table just to satisfy all these weird edge cases. Sometimes it’s better to dump stuff to the—especially for stuff like paging queries. There are plenty of times where there are perfectly good use cases for CTEs or using offset fetch or using top with CTEs or offset fetch to do paging queries. Other times, if you are letting people order by mounds of different columns on the result, you are better off dynamically creating temp tables and indexes so that you can support that ordering on a smaller set of data. For me though, with CTEs, it comes down to a couple other things. One, CTE results aren’t materialized. So if I’m accessing the results of the CTE multiple times, it has to execute that code multiple times.
Tara Kizer: I didn’t realize that until you talked about that yesterday with a client, so that was interesting information. So if you are going to be calling that CTE table query multiple times, throw that into a temp table to avoid running that multiple times.
Erik Darling: Yeah, one of the first goofy blog posts that I wrote on this site is called “CTEs, Inline Views, and What They Do.” I’ll throw that link into the ye olde chatter box so that you can reference that. It’s a blog post just about what happens if you reference a CTE multiple times with joins or something. What happens is that code re-executes. So CTEs, while they are sometimes good, especially with the top operator for give SQL a bit of a performance pathway or whatever you want to call it. They also do have their problems. They’re not perfect. Oracle has syntax to materialize a CTE, which I wish SQL Server did as well, but for now we have temp tables, which are perfectly good for materializing sets of data.
Tara Kizer: I feel like the fact that it does re-execute the query if you are using it multiple times that that could lead to weird issues where data has come in and the first time you accessed it, it didn’t have this and now it does. For that reason, I would definitely use a temp table if you’re going to be having to query it multiple times.
Erik Darling: For sure.
Tara Kizer: Unless, of course, you’re using RCSI then it’s just going to look at the snapshot I would assume.
Erik Darling: I don’t know. Who knows.
How do I get rid of TempDB spills?
Erik Darling: Let me see if I can figure out a short way to ask this. “I’m getting a tempdb spill on one of my queries that I can’t seem to fix. I’m doing a self join which tallies up a running total from previous days” and he can’t figure out how to get rid of a spill. Well, the only way to fix a spill is with indexes. It depends on what’s spilling. You’ll have to follow up on that, if it’s a sort or a hash spill.
(Note: theoretically, you could also fix it by stuffing the server full of RAM, adding a plan guide that includes artificially high row estimates, rewriting the query to dump stuff into TempDB first, etc., but you really should start with indexes.)
Should I use OPTIMIZE FOR UNKNOWN?
Erik Darling: Victor asks, and I’m just going through in order now, “I have a proc whose plan was somehow cleared from the cache the first execution of the plan causing parameter sniffing and high parallelism. CPU of 100 percent for 40 minutes.” Wow. “When this runs then [mumbles words]. Should I optimize for unknown along with MAXDOP?” So when would you use optimize for unknown?
Tara Kizer: Never. I know that we say use it sparingly or rarely. My answer is never. There’s just better ways of doing things. Figure out what is best. How it runs best for most executions of the parameter values. See if optimize for a value is helpful or maybe an index hint, but optimize for unknown, you’re getting a mediocre execution plan, maybe. So it does guarantee that you’ll have consistent executions—is that what it is?
Erik Darling: Yeah, so much like using a local variable, it will use the density vector to give you a row estimate. So if your data—here’s the thing that sucks about optimize for unknown. If your data is already so uniform that optimize for unknown fixes your queries, you don’t need optimize for unknown in the first place. You just need better indexes. If your data is so skewed that sometimes optimize for unknown is good and sometimes it’s not, then again, you need better statistics at that point. At that point, you’re looking at either filtered indexes or filtered statistics to try to help out things along the way. Optimize for unknown is one of those really dangerous things because as your data changes, unknown stops being a good flat estimate. Sometimes optimize for unknown turns into a whole different backfiring mess. Depending on how often the code runs, I may just want to through a recompile hint on there, especially if that’s for some reason harder than adding an index. I would much rather have a recompile hint if it’s not a frequently executing piece of code.
Tara Kizer: I’ve even put the option recompile on queries that were executed frequently just because we had the CPU overhead to do that.
Richie Rump: Yep. Yep.
What on-call software do you use?
Erik Darling: Here’s a question for Tara. “What page or texting on-call software do you guys use?” Anything free.
Tara Kizer: The first company where I was in an on-call rotation was at Qualcomm. We just had so many people in IT that we just made our own. We didn’t purchase one or use a free one. We just created our own. Then the last job, we didn’t use software. The knock just reached out to—we had a list and the knock just reached out to whoever was on call that week.
Erik Darling: Knock, knock, knocking on Tara’s door.
Richie Rump: Erik, why don’t you give your phone number out there? Because that’s what we do, any problems, we just call Erik.
Erik Darling: 1-900-MixALot.
Richie Rump: “And kick those…” Back to you, Erik.
How do I deal with TempDB spills without changing my query?
Erik Darling: Dave has sort of a follow-up to his question. “How do I deal with tempdb spills without drastically changing my query, if possible?” Any volunteers on that?
Tara Kizer: I’ll let you take that one.
Erik Darling: Without changing your query, so tempdb spills happen most frequently for two operators, and sometimes for one operator. Most frequently it’s a sort or a hash operation, so a hash join or a hash aggregate will most frequently cause a tempdb spill. You can see it sometimes with parallel exchange operator, sometimes those will spill out, but not too often. So for sorts, you need to have an index that supports whatever sort you’re doing so it doesn’t spill. That’s one way to help. For hash joins, again, hash joins get used frequently when the columns involved in the join are not lead columns and indexes that SQL is using so SQL kind of makes this horrible guess at how many rows are going to be coming in for the hash operation. So without drastically changing your query, without changing your query, change the indexes. Change the indexes so that you are no longer doing things that require sorts, like sometimes if your data is medium sized, SQL might choose a merge join and just insert a sort into the plan because it thinks that based on how many rows it’s getting back it’s going to be cheaper. Presort data and do a merge join and a hash join. Most common is because there’s a weird sort or hash in your execution plan. The only way to really get a grip on those is with indexes or with the right indexes I guess.
How do I prevent people from changing my recovery model?
Erik Darling: Graham asks, “I have a vendor who changes the recovery model from full to simple without informing us. Because where I work is cheap, I’ve created custom monitoring to report when database attributes are changed. Anyway, how do I get this special vendor to stop changing the recovery model?”
Tara Kizer: Why do they have access? I’m assuming they’ve hired a third-party DBA company and so that company is helping out? I don’t know but maybe you need to question their experience level if they’re doing that because that’s breaking your RPO. You’ve lost recovery points when it gets switched to simple. You can’t restore a full backup and the entire transaction log chain if went from full to simple. You have to do a full backup once you switch back to full so you can start that log chain again. So I would talk to the vendor and maybe management to say this is an inexperienced person touching my system and we can’t hit our RPO goal because of this.
Erik Darling: Sounds good to me.
What tools can help interpret execution plans?
Erik Darling: Doug asks, “Can you suggest any tools for helping interpret execution plans?” Interpreting execution plans? No. Getting more information, better looking information, is kind of the realm of Plan Explorer, SentryOne Plan Explorer. As far as interpreting them, you know, that’s a highly personal thing. That’s just something you have to learn over time.
Richie Rump: They’ve got books for that.
Erik Darling: They do. Grant Fritchey has a good book on—I think it’s free to download from either Simple Talk or Redgate or something.
Richie Rump: Is that book free? Maybe…
Erik Darling: They have a free download of it. But I think the previous edition is free and then the newest one is paid for. Something like that. I know that Redgate and/or Simple Talk has had his Reading Execution Plans book available for download before. I don’t know if they still do. Maybe they don’t but I remember at one point getting it from them.
Can CHECKDB cause a SAN to restart?
Erik Darling: “A friend of mine has had his DBCC CHECKDB cause his SAN to restart random vhosts. He expects performance degradation and [inaudible] stability issues. Have you guys seen anything like this? Not looking for a solution, just experience.”
Tara Kizer: I haven’t seen that specific with restarting servers that are pointed to the SAN but I’ve worked with large corporations that have dedicated SAN staff and they would come to us and say, “What are you doing in the middle of night that’s causing so much I/O load?” It was always CHECKDB that was running. So they could definitely see a spike but our hardware could support it. It didn’t cause other servers to restart. CHECKDB is a very important task that needs to be done. If your SAN is causing random hosts to restart, you need to get a better SAN.
Erik Darling: I’ve actually had this happen to a physical computer before. The reason that it was happening to me was running DBCC CHECKDB on a five or so terabyte database. This was on 2008 R2. What would happen is we had 512 gigs in there but max memory was set a touch too high. Max memory was set to like 500 gigs or something. We had to knock it down to 450 because for some reason when DBCC CHECKDB ran, it would just swallow memory and Windows would give up. So it could be a memory issue. I would want to make sure that data and memory were as close to even as possible or better than they are now maybe. I don’t know. That’s a tough one to troubleshoot from afar but that’s one thing that I’ve seen, but that was a physical host, not a VM. Anyway, we are at the 45 mark.
What is SQL Intersection in Orlando?
Richie Rump: Here’s a quick question for you though. “What is SQL Intersection in Orlando? Sounds like fun.”
Erik Darling: It’s a conference where everyone learns stuff and then and goes and does Vegas things. That’s all I know because I haven’t been.
Richie Rump: Well this is Orlando so they go and do Orlando things I guess.
Erik Darling: Oh, never mind.
Tara Kizer: Aren’t there two? Yeah, there’s the one in Vegas and one in Orlando. The one in May is in Orlando.
Richie Rump: Yeah, which Brent is speaking at.
Erik Darling: Yeah, go meet Brent.
Tara Kizer: They’ve got some really good speakers at SQL Intersections.
Richie Rump: Yeah, it’s a very curated list. They don’t bring just anybody. They really bring the best of the best there. It’s pretty good.
Tara Kizer: And it sounds like SQL Intersection is more about the learning whereas PASS is a lot about networking.
Erik Darling: … hanging out, beer. All right. That’s enough. I’ll see you guys next week for Office Hours, hopefully. Thank you for showing up and goodbye.
Tara Kizer: Bye.