This week, Tara and Richie discuss database corruption, CLR, DBCC issues, adding identity columns to databases, distributed transactions, tracking how recently a procedure was executed, maintenance plans, recompiling stored procs, parameter sniffing issues, troubleshooting replication with Always On, forecasting database growth and estimating capacity issues, tools for database growth forecast reports, RCSI, and much more.
If you prefer to listen to the audio:
Enjoy the Podcast?
Office Hours Webcast – 2016-10-03
Why do I have rows in suspect_pages?
Tara Kizer: Andrew has a question. I actually saw this question in the help account too. Not sure if it got answered or not or if we directed you to Stack Exchange. It’s a little lengthy, but since we don’t have that many questions, I’ll go ahead and say the question and answer it. So it’s asking about database corruption, CHECKDB versus suspect_pages, so the suspect_pages table is in MSDB database and if there’s a row in there, it indicates you’ve had past corruption. He’s saying that he ran CHECKDB against that database where the suspect_pages row is showing that there’s corruption. CHECKDB is good and there was a row for tempdb as well, but that value was from the year 2015 and tempdb gets recreated every time we restart SQL Server.
So to answer your question, the database corruption existed at some point. Maybe the restarts of tempdb has resolved it. If you haven’t moved hardware in the past three years, chances are, the database corruption issue, whatever the root cause was, may still exist, depending upon how out of date your I/O subsystem drivers, all those types of things, are. But if you had any rows in the suspect_pages table, you had database corruption. Maybe it doesn’t exist now, but if you haven’t fixed the root cause of it, it can come back. So it’s something you want to take a look at. Make sure you run CHECKDB against all databases. Do the full check, don’t just do physical-only. I know some people only run physical-only, but once you have a row in the suspect_pages table, I think it’s prudent to go ahead and run a full CHECKDB against all of them. And see how out of date you are for patches. SQL Server can cause database corruption if you’re not on the latest Service Pack and Hotfix. Remembering that, going forward, Microsoft is moving away from Service Packs, so you’ve got to apply Cumulative Update packages. And then the other side of the database corruption is usually somewhere in the I/O subsystem. It’s not necessarily a disk is bad. It can be, but it could be just out of date drivers, firmware, all that type of stuff. So make sure all that stuff gets up to date fairly frequently. You know, once every couple years is not enough.
How can I track memory usage from CLR code?
Tara Kizer: Alright, question from Christopher. This is probably going to be a Richie one to answer. So, “Developers have created a lot of code that caused external CLRs and we’re seeing memory pressure events and those CLRs get unloaded and then reloaded. Aside from an extended event profile where I just line up the times when the CLRs are unloaded and reloaded and just try to line up the poor memory-hogging queries, is there any way to see memory usage by CLR?” I thought it was going to be a Richie question, but I don’t think it is.
Richie Rump: No, I mean, I like the concept of the CLR, it’s just, I think the hard part about CLR in SQL Server is knowing when to use it and when not to use it. And sometimes, us developers, all we know is, I could put .NET code inside of SQL Server. Yeah, I could do that; that seems like a great idea. Nine times out of 10, it’s really not a good idea.
Tara Kizer: I get stuck on some clients. We’ll see CLR wait stats come up and it used to be that we’d ignore those wait stats because SQL Server is using CLR stuff and so there could be a lot of waiting time on that stuff. It’s ignorable, but there was a client who had a CLR issue. Redgate software was pointing it out and we didn’t see this issue because it was on our ignore list. So we had to take it off the ignore list, and now I see it all the time as being one of the top three waits on a system. And I always have to ask, are you using any CLR. And unfortunately, a lot of clients are not sure… I do not know… So usually, it’s SQL Server stuff, but troubleshooting it is challenging. I mean, it’s not running – you’re not going to see the stuff that it’s running. Unless it’s running stored procedures, this is all external stuff that it’s doing. So I don’t really have an answer to this question, except you need more memory.
Richie Rump: In case you didn’t know, SQL Server does use the CLR. You can’t turn it off. It’s just going to happen. If you take a look at, like, a lot of the new stuff they put in T-SQL in 2012, that is mostly CLR stuff. I mean, I could take you right back to the .NET function – oh, that’s exactly what that is and that’s how it’s using it. And my cat’s trying to crawl on my keyboard and that’s not a good thing.
Tara Kizer: Oh, and that Redgate software, the version they’re using, I’m almost positive it was the free version. So maybe try out that, the Redgate free monitoring tool, and see if it points to an issue with CLR. It might give you some more detail. I’m almost positive it’s the free version of the software.
Richie Rump: I guess the other question is that, if this is a problem, if memory is a problem, then why not just throw more memory at it? I mean, you may be spending a lot of time, hence your money, because they’re paying you for your time, and why not put that towards more memory, towards the box. Just food for thought.
Tara Kizer: Yeah, and the two ways that I look at memory as far as T-SQL goes, which may not apply to the CLR, depending upon what the CLR is doing. I’m looking at sp_BlitzCache, reads, average reads, and then the memory grant sort orders. So reads and average reads will take a look at what’s using a lot of data pages, so the buffer pool, and then memory grant is the query workspace area in RAM. So is there anything that’s giving large memory grants that maybe it’s clearing out a lot of space in RAM and now that CLR has come under pressure and it’s being unloaded. So I would look there and look at BlitzCache and reads, average reads, and memory grant sort orders.
Should I use Hekaton or SSDs?
Tara Kizer: Alright, Steve asks, “We are in the process of having an argument – discussion – about the relative merits of Hekaton and SSD. Care to chime in?” I don’t know that we’re the right people to talk about Hekaton. I mean, we’re generally against – Hekaton is in-memory OLTP, right? I always forget what the acronyms are later because after the product gets released, we don’t use that term anymore. We start using the real term…
Richie Rump: The product name.
Tara Kizer: Yeah, so Hekaton is in-memory OLTP. SSD, yeah, love SSDs, love it, especially for tempdb. But I’m not sure what the discussion about Hekaton and SSD, what that correlation is.
Richie Rump: Yeah, I played with Hekaton when it was still in beta and there were a lot of issues back then and still they’re in the back of my mind, and I don’t even know if they’re issues anymore. So you know, once you create a table on it, using in-memory was really difficult to take it off, or you couldn’t take it off at all and you had to drop the table. And there was a lot of other, you know, kind of wonky stuff. You couldn’t put some constraints on it and things like that. It was like – it’s not what you think it is. So if you’re thinking of, I just want my normal OLTP database and maybe this one table will go faster if it’s in-memory, it probably is not going to do what you think it’s going to do. I confess, SSDs are always phenomenal.
I have this failing maintenance plan…
Tara Kizer: Alright, Lee has a question, “I inherited a SQL Server and it has a DBCC job that is a maintenance plan. It is failing, but if I run the DBCC command in SSMS and loop through all the databases it works fine. I have recreated the job, but it still fails.” I mean, this is the reason I don’t use maintenance plans ever. I haven’t used maintenance plans since SQL Server 2000. And I only used it back then because that’s what log shipping used. Maintenance plans are hard to troubleshoot. Some people know how to troubleshoot them better and say that they’re okay as long as you know how to troubleshoot. But I’m just like, why do I have to learn this other thing when I can just use custom code to do the same task. So if you’ve inherited the SQL Server and you’re the one that’s going to be maintaining it going forward, put your job in place instead. Get rid of the maintenance job. Maybe it’s not even doing the task that you want it to be doing. But if it’s DBCC, it’s probably doing a CHECKDB, and I would be using Ola Hallengren’s integrity checks stored procedure instead. I would not be using maintenance plans at all.
We need to add primary keys to thousands of heaps…
Tara Kizer: Alright, Nika, “Long story, we need to add primary keys to thousands of heaped tables and 30-plus databases. What major issues may come up if we just add an identity column? How can I know which tables may need to be rebuilt.” Well, if you’re adding and identity column, it needs to be rebuilt. You can’t just add it. So that is going to be a fairly big task. Make sure you have a fairly large maintenance window for this and maybe tackle one database at a time, maybe a set of tables at a time. And if these have a lot of reads versus writes, besides just adding the primary key, make sure it’s clustered. So heaped table is a table without a clustered index, and if you have a lot of reads as compared to writes, it probably should be a clustered index table. If a lot of writes as compared to reads or if write performance is way more important that reads, keep it a heap. So primary key, non clustered, but make sure you make that decision per table. If this is an OLTP system, generally all tables will be clustered indexes, unless they’re staging tables.
Richie Rump: Yeah, I think the other thing that you need to look for is data quality. So when you’re adding primary keys, most likely, you don’t have unique indexes on there either. So you’ll probably have duplicate data in there at some point. Finding that is crazy and it takes time. And a lot of times, you’re trying to go through all these rows, like okay, which ones the real row and which one’s connected to other tables, and all this other stuff. But once you put the primary keys on, that’s an exercise you probably should step through either during or once you’ve already done all that work. Because guaranteed, if you don’t have unique indexes on there, you will have duplicate data. It just happens.
Are distributed transactions okay?
Tara Kizer: Alright, Jason asks, “Do you guys have anything against distributed transactions?” The only thing that I have against it is if it ever leaves an orphan transaction out there. So as long as it’s cleaned up the work when a rollback happens and it’s committing transactions as fast as possible, I don’t have any issues with it. As far as a DBA, I always was annoyed when java developers would want me to set up the XA distributed transactions thing. it was a manual step. I think I read, in newer versions, maybe it’s 2017 or the new version, 2019, that automatically has that set up now, since a lot of people are using it. So I was always annoyed DBAs were having to do this extra work. It was an easy task, but it was just very time-consuming across multiple servers I would have to do it on. So just making sure that rollbacks occur and not orphan transactions and the transactions are committed as fast as possible. Do you have anything on that topic, Richie?
Richie Rump: Nope, nope, thank goodness. Never needed to use it…
Tara Kizer: Yeah, it sure complicates the system involving other pieces.
Richie Rump: I think it’s one of those things that, as you’re architecting it, you always go, okay, do you need this? And you try to do everything possible not to do that. And that was actually one of those things. Distributed transactions, no, I don’t want to do that. That makes things a lot more complicated, and as architects, you’re trying to make things simpler, not more complex, even though that’s where we tend to go because we like to join lines and boxes on whiteboards. I mean, that’s our profession. But yeah, I mean, that’s one of the things that I never want to put into. It’s like, okay if we really need to, then we’ll do it. But there’s a lot of pain involved there and I’m not one to run towards pain at all.
Tara Kizer: Yeah, and ask yourself, does it really need to be synchronous as part of the – or can it be asynchronous, so it’s not in a transaction?
Richie Rump: Yeah, even banking systems, they’re not synchronous. You would think they need to be. Nope…
When was a proc last executed?
Tara Kizer: Alright, Shaun asks, “Is there any way to tell how recently a procedure has been executed?” I can’t think of anything where the system is storing this. So if you need to know this information, you know, modify the stored procedure to write this information… You could set up a job to write current activity to a table, such as every minute, let’s say. But that’s only going to catch the stuff that’s running at that moment in time and if the stored procedure is extremely fast, you may never catch it when that job runs.
Management wants me to use maintenance plans…
Tara Kizer: And Lee follows up with his maintenance plans question that, “Management wants to use maintenance plans for some reason and cannot change it.” I just would not be at that job if that were the case. If I were forced to use maintenance plans, I literally would not work there anymore. This is a showstopper for me. If I’m the one in control of the SQL Server, I’m going to be making the decisions for how the jobs are set up. Maybe I won’t have control over the domain and the operating system version. A lot of times, I’ll input for those things, but as far as jobs go, I’m going to be the one in control of those. I will be open to moving job start times, but no, maintenance plans is a complete showstopper for me.
Richie Rump: I mean, that’s curious. I’ve always, when management, when I didn’t work at the place I don’t really work at, but when management forced something technical upon us, I’d always – probably in a closed door, I wouldn’t do it in front of everyone – but ask, why? What’s the business reason why we need to use this particular feature or function? And most of the time, they had a good business reason, well we want to do this and this and down the line, this is the feature where we’re trying to go. And I’m like, okay, we’ll work it out. But sometimes, it’s like, well no, I thought that this plan thing is what everybody else used and it was a best practice. And it’s like, no, you’re misinformed. So a lot of those technical things, you know, the managers at some point were technical, and maybe four or five years have gone by and that’s no longer the case. And so we have to re-educate them and it’s like, yeah that’s not really current thinking. That’s kind of a bad idea nowadays. And what I’m using in the cloud, there’s a lot of things that when we weren’t in the cloud were good ideas that are absolutely horrible ideas in the cloud. And when we’re talking about stateless processing and things like that – so I feel you. I know where you’re at, but ask those questions first. Don’t snapback or anything, just ask for the business justifications for using these things. And if they are sound and valid, then you have to run with it. But if you have other suggestions on how to accomplish that without doing something that’s not quite kosher technically, then go ahead and offer those up.
Can I dynamically recompile long-running queries?
Tara Kizer: Teschal has a great question, “I have a job to recompile stored procs if their value of max-elapse time is greater than 0.08 milliseconds…” which I think is wow, okay. “Initially, it was for one stored procedure and the job has one step, but now it becomes multiple procs and multiple job steps. How can I make this dynamic so I don’t need to add a job step every time a proc becomes bad?” Teschal, you need to take a step back. We would create these jobs back in SQL Server 2005 and older days. We didn’t have a lot of tools available to us to troubleshoot parameter sniffing issues.
Adding a recompile job for stored procedures once they surpass a certain elapsed time, I mean, sure, maybe do that, but you need to start troubleshooting why you’re encountering parameter sniffing and look towards the newer features of SQL Server, starting with SQL Server 2008. We’ve got option recompile available, so you can do it at the query level and not at the stored proc level. I mean, obviously you’re doing sp_recompile, but option recompile. We also have option optimize for, so optimizing for a specific input parameter. I like that one a lot and I’ve done that both hardcoded as well as in the plan guide.
If you’re on SQL Server 2016 or newer, we’ve got the fancy query store feature, and that would be an excellent feature for you to use. It’s got a nice interface where you can see execution plans degrade, you know, the performance degrade of a query and you can tell it, I want to use this other plan. So I would be taking a step back and not doing the approach that you’re doing. What you’re doing is what we did a while ago. Start looking into why the parameter sniffing issues are happening and what other workarounds should we put in place. Don’t default to recompiling it.
What should I measure when Always On slows down?
Tara Kizer: Alright, Pablo asks, “What should I monitor when trying network if replication with Always On gets slow?” So replication…
Richie Rump: Networking team? I don’t know…
Tara Kizer: I don’t know that I’ve ever really blamed the networking team if the replication is getting slow. What I would be looking at is the replication monitor and seeing, did we have a large transaction run? Look at the publisher to distributor history, and if you see a large amount of rows or records having to be scanned in the transaction log, you probably had something large run. Look at the distributor to subscriber, the history there, and the unpublished commands, or whatever; it’s been a while since I’ve looked at replication. Look there. You probably have a large transaction run and a lot of data now needs to move from the publisher to the distributor and the distributor to subscriber. Yes, it does go across the network, but I don’t think I’m blaming the network for that issue. Large transactions are going to cause latency for replication.
What should max memory be set to?
Tara Kizer: Thomas asks, “I’m running SQL Server 2014 Enterprise. I have max-server set for 89GB of RAM and I always have 6GB available. Is this setting okay? Do I need to turn it down a little for the OS to run?” Okay, so you told us you have max-server memory set for 89GB of RAM, but we don’t know how much RAM your actual server has. So our really quick rule of thumb is to give 4GB or 10% of RAM back to the OS and other processes. That’s just a quick rule. Jonathan Kehayias has a different type of calculation you can do. So depending upon how much RAM you have, you end up needing to give the OS and other processes a lot more RAM. I did write a dbatools PowerShell function that can do this work for you.
Okay, so he follows up and says he has 96GB of RAM. So 96 minus nine – you’re right around where we would recommend and that fact that you have 6GB always available, you’re probably okay. and maybe you could give more to SQL Server. Are you having a memory issue? I think your setting is just fine where it is.
Tara Kizer: Alright, Darshan asks, “Would you know how to forecast database growth over a period of time and how can we estimate capacity issues?” That’s something that DBAs struggle with; how much storage are we going to need in two years? What’s the growth going to be? And as a DBA, I don’t answer that question. I go back to the business and they have to tell me what their forecasts are for application features, are we going to be on-boarding a lot more customers going forward. I can tack it, and I can check out the growth of it. I can chart it out, but that doesn’t mean that a new feature is going to come out that’s going to be using a lot more data. Maybe a year from now, our sales are going to be great and we’re going to have more products, more customers in the system, things like that. So I can track it and I can forecast current growth rate, that doesn’t really tell me the growth rate of the business.
Tara Kizer: Christoper asks, “Wait, your SQL Servers have more than 4GB of RAM?” You’d be surprised what some of our clients – how small their servers are and they’re coming to us for help and we’re like – a lot of times, our replies are, even my laptop or desktop has more RAM than your little bitty server. What are you guys doing?
Richie Rump: My favorite is, my phone…
Tara Kizer: Yep, I just got the new iPhone XS. It’s way better than my old iPhone 7 – it wasn’t even that old. It’s probably a year old, and I just got the upgrade.
Richie Rump: And I have the iPhone 10 – the 10S by the way, not the XS – and I feel like it’s so old.
Tara Kizer: Some people will say 10, some people will say X – I never know which one to go by.
Richie Rump: This is less than a year old and already I’m feeling like, man, I should need a new phone. Jeez…
Should I set minimum memory?
Tara Kizer: Alright, Thomas asks, “Is there ever a reason to use minimum memory settings?” So, at one of my companies, we would set the min-server memory to half the amount of max-server memory, but this was on failover clusters where we had multiple instances, and we don’t recommend instance stacking. So if you have two instances, we would normally recommend three nodes in a cluster. But we would set that just in case a failover occurred and we wanted to make sure an instance got a minimum amount of RAM if it ever had to share. So you could do that to make sure that even on a non-failover cluster, standalone – but I would only do this if I had multiple instances, and these days, I wouldn’t have multiple instances.
Richie Rump: You know, we had a failover incident in ConstantCare on Monday afternoon. I don’t know if you saw that. So we are running in Aurora on AWS, and so we have a cluster. And when we start pegging the database, sometimes things kind of get log jammed a little bit, occasionally. And we had a failover and so we had files fail. All our data is encrypted and it just automatically failed over and I just came home that night, reset everything to run, and everything reprocessed. So those are the weird things about serverless and the cloud, that when you have these failover instances, these things that happen, you just always have to figure out how to retry it, right? So the weird cloud thing – most of the time, when we used to create stuff, it’s like, retry? We never would have re-tried something. Now, everything has to be re-tried in the cloud now. You hit that button and all of a sudden, everything fails, gets pushed back up… Failures…
Can I use PowerShell to set max memory?
Tara Kizer: Jason wanted to know the name of the max-server memory function I created over at dbatools. I don’t remember it. dbatools.io is the website – in Google, SQL Server dbatools; that will bring you to the website for the dbatools. There’s probably hundreds of functions out there. Search for all the ones that say memory. It probably is, like, set max-server memory, I just don’t remember. I tried searching for it while Richie was talking about the failovers, but unfortunately, it’s not on this specific machine that I worked on; it’s on my laptop.
Richie Rump: So you’re actually trying to get something done while I was talking about my failovers. At least something good came out of it.
Tara Kizer: I was just scrolling through Windows Explorer real quick.
Does sp_BlitzIndex check databases or indexes?
Tara Kizer: Nika asked, “Does BlitzIndex work at the database level or per index? And how long should I plan for it to run if I know there are significant issues with indexes and up to a total of 300GB in index space on the database?” So, it’s not the size of the indexes that’s going to be the problem for BlitzIndex; it’s the number of indexes. Normally, BlitzIndex doesn’t take very long to run on most people’s systems. But if you have thousands of indexes, it’s going to take a long time. I recently had a client that Blitz Index would timeout through a collector tool that we use. And even when I ran it, it was taking forever. I forget how many indexes they had, but it was in the thousands and it took like 45 minutes to run. That is not normal. It should take a minute, two minutes; it just depends on the number of indexes. And it doesn’t really matter about the significant issues either; it’s the number of them that matters. Just let it run in Management Studio though. By default, there isn’t a timeout value there.
How should I forecast database growth?
Tara Kizer: Darshan asks, “Which tool can I use to create database growth forecast reports?” I actually do have a .NET CLR object out there from many, many years ago. This is probably 10 years ago. You could use it and store the data into a table and point an Excel at it. I don’t remember what that’s called. It’s on my very, very old blog, weblogs.sqlteam.com/tarad – D as in my maiden name initial. It’s out there, I think it’s called like database growth. And it actually was featured in SQL Server Magazine a few years ago, but I don’t think I would use that anymore. I don’t know what I would use. It’s not something I’m looking at these days.
Richie Rump: Blitz-something, right? I don’t know, maybe…
Tara Kizer: Oh yeah, BlitzFirst can collect it. It does collect the file stat information, so you could utilize that. so if you download our First Responder Kit, the PowerBI dashboard can read the Blitz tables. I think there’s a script to set up the jobs to run BlitzFirst every 15 minutes. And one of the things it can collect is file stat information. So you would have to take a look, you know, write a query to call those tables. I’m not sure if the PowerBI dashboard would show you the growth chart, but at least you have the table to collect that data, or the tool to collect it.
I’m running out of worker threads due to my AG…
Tara Kizer: Alright, Celso asks, “We have a SQL 2016 Availability Group set up with local as well as WAN secondaries. We have about eight Availability Groups with about 40 total databases on them. I got…” Oh, he got excess today. Anyway, “It’s a four core server with 72GB of RAM. I ran some queries to see how many of my 512 workers were used and I notices that on the secondaries, the number of committed workers is much higher than on the primary. What gives?” I haven’t looked at the worker threads on a secondary to see what the difference is. There’s probably Availability Group worker threads in play. There is an article out there, and I don’t have the link handy but it’s probably on MSDN. It will show you how many worker threads an Availability Group can use. And four cores may not be enough for your setup if you have this many databases. The more databases you have, the more worker threads are going to be needed just for Availability Groups. And if you ever have a blocking chain occur, you will, much quicker, run into THREADPOOL waits than someone who is not running an Availability Group with lots of databases. You have to be very careful with the number of databases for a low-core server when using Availability Groups. Four cores is not a lot of cores these days.
I enabled RCSI, but it’s not working the way I want
Tara Kizer: We’ll do one more question…
Richie Rump: Ooh, look at you, the overachiever…
Tara Kizer: Alex asks, “I set read committed snapshot on, but still having problem with readers blocking writers, selects versus update according to sp_whoisactive. Am I missing something?” Alex, when you have – read committed snapshot is RCSI, read committed snapshot isolation level. It will help out with readers versus writers as far as blocking and deadlocks go, but there are times where RCSI does not get mapped, so your transactions have to be read committed. That’s why it’s called RCSI. So the RC is read committed. You have to be using read committed for RCSI to get mapped to it. If your transaction is read uncommitted, such as no lock hints, that doesn’t get mapped to it. Also, there are times where we need to escalate above RCSI behavior to make sure blocking occurs. So we’re protecting data and not allowing anybody to read or write for the specific row or page or whatever it is. So we may go to serializable isolation level for specific business cases. So you’d have to look into what’s going on with those.
If these are deadlocks, you could use our sp_BlitzLock, which will give you isolation level information. But start tracking – oh, with sp_whoisactive, add the GET_ADDITIONAL_INFO input parameter to your sp_whoisactive execution. So set that equal to one, and then there’ll be an XML column, additional info, I believe it’s called. Click on that and it will show you the isolation level of those. And then take a look at the actual queries. Are they escalated to serializable isolation level? For instance, UPDLOCK HOLDLOCK is serializable, but you could also set isolation level to serializable. So there’s things that do not get mapped to RCSI and read uncommitted and serializable do not. And somebody found the name of the PowerShell object. It is test-dbamaxmemory and set-dbamaxmemory, so there you go. Thank you, Jason and Teschal who found those. Alright, that’s the end of Office Hours. We will see you guys next week; bye.