This week, Erik – all by his lonely self – talks about troubleshooting remote queries, his thoughts on SQL Server 2017, licensing, index rebuilds, index fragmentation, why you shouldn’t restart SQL server every day, and much more!
Enjoy the Podcast?
Office Hours Webcast – 2017/04/26
Why doesn’t SQL Server use indexes on linked server queries?
First question here is from Michael, “can you think of any reason indexes would be ignored or not used by a query being performed across a linked server? Server A is SQL Server with Native SQL linked to server B, server B…” Hmm, think of a reason why, well, I can think of lots of reasons why. Linked server queries are the pits, they are the absolute worst. It could – well no because it’s 2014 to 2012. So depending on how patched up you are on 2012; 2012 sp1 had an important fix in it where you no longer needed crazy elevated permissions to use statistics and other objects on a linked server, so prior to that you had to have SA or SA-ish superpowers on your server to get that.
Generally, though it’s just really tough to troubleshoot remote queries because SQL can make all sorts of weird decisions about where it’s going to do the processing, whether it’s locally or drag data across. So, you know, sometimes it just makes them really difficult to troubleshoot weird performance ticks like that. There’s a really good talk by Conor Cunningham, that’s Conor with one N, at SQLBits about distributed queries. It’s not going to answer your question particularly but it’s going to give you a much better idea about just how complicated distributed query processing like linked servers and stuff is. So once again that’s Conor Cunningham at SQLBits – if you just search the domain for Conor Cunningham and distributed query you’ll find it a lot easier than if you try to use the built-in search on the website.
Any thoughts on SQL Server 2017?
Let’s see here… Philip asks if I have any thoughts on SQL Server 2017. Well, of course, I have lots of thoughts on it, it’s only got CTP2, but man, they are just adding so much kickass stuff already. Like I can’t – like I don’t know what they’re going to put in CTP3 that’s going to like outdo what’s in CTP2. The adaptive join stuff, even though it’s just at the start for batch mode, executions, only for column store stuff, is incredible, it’s absolutely incredible what they’re doing. The interleaved execution, so they’re going to start making much better estimates from multi-statement table value functions – not necessarily for table variables themselves but for multi-statement table value functions they’re going to go out and do this thing where instead of just guessing the one crappy row and giving you that one awful estimate, they’re going to go and try to execute the function and get a row estimate from there and then – they’re going stop cardinality estimation, do that, then resume cardinality estimation with the row count from the function.
So that’s pretty amazing, I don’t know how many of you guys out there are using multi-statement table value functions but they just have so many damn problems because of table variables and because of other, you know, cardinality estimate bugaboos that, like, just the fact that they’re stepping in and doing this is pretty incredible. We were talking about before, the bucketed wait stats coming to query store where now you’d be able to, you know, sort of by troubleshooting scenario, whether it’s CPU or parallelism or blocking or locking or whatever else, you’ll be able to look at query store, if you have it enabled and you’ll be able to look at wait stats associated with executions of the queries, you get wait stat information. I’ve been messing around with that a little bit lately.
God, what else? Linux… Just kidding, I don’t care.
Man, every time somebody tries talking about Linux I fall asleep; but there’s a whole bunch of new cool DMVs out there that are going to start cropping up. They’re going to come in real handy for stuff like Blitz and BlitzFirst, and one that’s going to be awesome for BlitzIndex, where it looks like they’re finally starting to record query plan hash, and I believe last SQL handle for missing index requests.
So we’re finally, at long last, “like let my people go.” we’re finally going to be able to tie missing index requests to specific queries, which is something you just can’t do now unless you go mess with the plan cache and do all sorts of awful stuff in the plan cache and then tie it back to the missing index DMVs.
It’s just – it’s a mess trying to do that because you have to go and like do all this special XML processing of the missing index request to get the column names and the grouping and the ordering and then join all that stuff back to the missing index DMVs and that never really belonged in BlitzCache or BlitzIndex. Because for BlitzIndex, you know, we would have to add a whole bunch of XML processing overhead and hit the plan cache and look for stuff. And in BlitzCache, we would have to go and do a whole separate extra bunch of XML processing and then hit the missing index DMVs to try and get that. And this one DMV solves that entire problem and bridges that gap. It’s just up to you people to finally upgrade your servers.
So there’s all sorts of stuff in 2017 I’m excited about, like we talked about a little bit in the preamble, in the lead up while you were sitting here watching me do nothing and goof around with my headset. There’s, also coming, a special new switch, a query hint I’m assuming, that will allow us to force queries to run parallel. If you were in GroupBy he talked a little bit about that, Brent showed the web page on his screen, but I’m super excited about that as well.
Will a SQL 2014 license work with SQL 2008?
Let’s see, let’s find some questions, jeez – Doug has a rather awkward licensing question. He wants to know if a 2014 license is backward compatible for 2008. You know I really try to avoid answering life or death licensing questions like that. Talk to whoever you bought licensing from, whether it’s a third party vendor or whether it’s directly from Microsoft, talk to your sales rep and see what they have to say. I really hate trying to make sense of that kind of stuff, especially because I don’t know what deals you might have had in place, what kind of grandfathering or SPLA or what else you have going on. Talk to whoever you bought the licenses from and get clarification there.
What are your thoughts on maintenance plans?
Let’s see here – another Doug, a different Doug, I’m going to call you Different Doug, asks, “what are your thoughts on maintenance plan tasks such as rebuild English…” Because I can speak English so well… “Rebuild index and check database integrity?” Well you know, I think less of index rebuild tasks than I do of database integrity tasks. You know, DBCC CHECKDB, you’ve got to run that, you’ve got to know if your databases are corrupt, because corruption just gets worse over time. It’s very rare to have corruption hit one place and stay there, you know. You can have like some kinds of corruption that might be localized but generally, if a disk starts going bad, that rot tends to spread and get worse so I wouldn’t want to have that keep going over time.
Index rebuilds, you know, as infrequently as possible, I’m not a terribly big fan of those. There’s a great comment on one of our blog posts from Michael J Swart where he says, “index rebuilds are just a tremendously expensive way of updating statistics” and I couldn’t agree with a statement more. I would much rather see everyone updating statistics regularly and rigorously than doing a whole bunch of index rebuild and reorg stuff that just very rarely solves problems. And, more often than not, figuring out which indexes are fragmented and then defragmenting them takes more time and resources than your queries would ever take reading fragmented indexes. Now that’s not the most scientific statement but generally what I’ve found, especially working in environments where I was managing hundreds of terabytes of data. The amount of time that I would spend running these maintenance tasks would never ever add up to the amount of time that queries spent running.
Who cares if an index is kind of fragmented. A fragmented index is a lot more like having a, kind of, a dirty room than it is like having some sort of performance ending apocalyptic event like a comet smashing into your server happen. So I tend to leave the index rebuild and reorg stuff way, way, way off. I say reorg at 50%, rebuild at 80% and really, even Microsoft’s other recommendation, around 1000 pages, that’s 8MB. If you have trouble reading 8MB off disk and into memory, your server has problems that index rebuilds aren’t solving. For that, especially for older scripts, I’ll set page count level way up to 5000 or something, so I’m only dealing with like real big honking tables, because those small tables may not even actually get defragmented anyway. That’s kind of what I thought about that.
How upvoted will your question be by end of day?
“Any bets on how upvoted the question you discussed in your blog post will be by the end of the day?” Which, dude, which one? I’ve got to go look now. Is there a blog post today on it? I’ll be honest, I haven’t looked at the blog today. I haven’t gotten any emails about comments so I haven’t had a reason to, and sometimes Brent just does funny things with schedules. Oh okay, that one – I don’t know, hopefully, hopefully fairly upvoted because that’s a good question. I thought that I would have a really good answer to that but I kind of got beat to the punch when I started looking at all the other stuff going on and these things, then I got sucked into the start up expression predicate thing and well, I don’t know. I hope that question gets upvoted and gets some attention because it’s a good question, it’s an honest question.
I also hope that Joe’s answer gets him upvoted because he gave a pretty good detailed answer on what’s going on in that query. So props to both of them because, you know, there’s a lot of potato-y questions of Stack Exchange that – they’re tough, tough to read some of them sometimes. You’re like what are you doing man, stop that.
How interested are you in Python in SQL Server?
Let’s see here, Clark asks, “how interested are you in seeing Python within SQL Server?” Well if it’s the same as R where it’s just a data analysis and like, you know, a data scientist type add-on, I’m not that excited. What would really get me going is if, you know, like in, say Postgres, or I believe in Oracle where they have – where you can write procedural code or you can use other languages within stored procedures and functions. I would be super interested in that because there’s some stuff that’s just way better suited to that, and like, you know, we have CLR, and CLR’s okay for stuff, you have to know C#, I don’t want to learn C#, C#’s a pain in my butt.
C# is also, you know, you have to compile a DLL and add the DLL, there’s a lot of binary involved, stuff that’s scary on frozen caveman DBA, I don’t want to do all that. If I could – I’m okay with Python from a previous job where I used to do some like text file manipulation with it as part of an ETL process. So if I could like, you know, actually write stuff in Python to do, like say regex, which still isn’t in SQL Server. See now this is one of those things, you can do regex in CLR, you can implement that and that’s usable and it’s probably doable, but you can do regex in CLR and that’s pretty good. And you can do some okay stuff with PATINDEX and CHARINDEX and you know, Microsoft, for all the cool stuff they’re adding with the STRING_SPLIT function in 2016 and STRING_AGG coming to 2017 where you can skip all that sloppy FOR XML PATH stuff when you want to make a CSV list of things.
We still don’t have legit built in regex in SQL Server. It would be awesome to have, but if I could, you know, port that out to writing a function or writing a stored procedure in Python to do that, I would be much more excited than I would be about the data scientist stuff. That being said, I’m all on board with Microsoft adding that for people who do it. I think it’s a great feature for people who are data scientists or aspiring data scientists or whatever else, but it’s just not for me as a data science feature. If I could implement that in some other way, I’d be thrilled.
If I could use Python to parse query plan XML rather than writing XQuery and T-SQL, BlitzCache would run in like a second. It would be the fastest thing ever because I could skip so much garbage, I could skip so much garbage. Just as a proof of concept, a while back, I just messed around with importing one fairly large bit of query plan XML in Python and doing some parsing and it was just like done, and it called everything out. There was no, like, repeated cross supply and all the other stuff that – the cross supply with the nodes and the exists, blah, blah… There was none of that repeated overhead looking for stuff, it was all very fast. Anyway, been talking for long enough.
J.H. asks, “having a stored proc seems to be hanging in suspended with two open transactions, it’s an update with a SELECT FROM and it’s blocking itself and not moving at all with the CX packet wait type, any idea what to do?” Boy, I would need a whole lot more on that J.H. I’m sorry, that’s just not much to go on for here. I would probably just want to go ahead and post that with a whole bunch of information on dba.stachexchange.com, I’m not being lazy on you, there’s just not enough for me to really tell you.
Should I restart SQL Server daily?
Let’s see here, Ujwal, I hope I said that right, says, “for starting SQL Server services, clear the buffer pool and buffer cache.” Yes it does. “Is it recommended to have a daily restart?” No, absolutely do not do that. “Any benefit?” Only if you have – let’s just say no. I was going to give you an outside case, but I don’t want you to say oh I have that outside case and start doing it. No, don’t restart SQL Server every day, it’s not good, it clears out a bunch of stuff you probably want cached. SQL Server goes through great pains to cache things because it’s faster to do that.
What are alternatives to linked server queries?
Doug asks, “you said linked server queries were bad, can you suggest an alternative to using linked server queries to provide a single data source to SSRS?” Tableau, just get Tableau, get a Tableau server, move data once, I don’t know. I tend to dislike linked server queries because when I’m writing a query that has to perform well that’s going to use cardinality estimation, I’m going to want to, you know, go out and do some like role emanation on the other server, I can’t count on that. What I would much rather do is write the simplest query possible to move data over locally and then have my query hit local data rather than try to do some big crazy expensive, you know, multi-join, lots of WHERE clause stuff on our remote server. So get data local and then query it.
Should I separate SSAS/RS/IS from the engine?
M.M. asks – so the gist of the question is they have physical servers and they’re moving them up to 2014. The main production server uses SSAS and SSRS, we’ll have access to one new VM, would you suggest… Generally, when I have outside components for SQL Server I want them separated from the core engine stuff. SSAS and SSRS and SSIS as well can be pretty bad data hogs, or memory hogs rather and just resource hogs in general.
Whenever I’ve had to set up environments that needed those I always wanted to stick them on their own server because generally, you can get away with sticking them on a slightly smaller scale server. Like say it could be 2014 standard edition, you could stick, you know, not as much ram and, you know, fewer processors on there and get away with it. I just don’t want that stuff walking over my production workload unless I’m on such a big honking server that everything can spread out and do whatever it wants. I’m just not a big fan of having stuff sort of collapse in on itself there.
Should I upgrade in place or build a new server?
Let’s see, going on down… Nester asks, “what do you recommend for version upgrades, in place or fresh installations?” Like we say once a week at least, fresh installations use mirroring or log shipping, or if you have the wherewithal availability groups to do your cutover to minimize downtime. In-place upgrades have a lot of issues; if anything goes wrong with the install, you don’t have a backup plan. If you get there and all of a sudden performance tanks, or like, you know, you bring people in to do a smoke test; Kaboom, that’s it, you don’t have a backup plan to go back to a previous version or even back to a previous hardware.
John asks, “what do you think of Microsoft Dynamics running on Azure Cloud?” I don’t, sorry man, not the biggest fan of Dynamics, also not the biggest fan of Azure, so good luck, good luck in your endeavors.
Do I need to rebuild indexes if I use SSDs?
Justin asks, “if running on SSDs, is there any value in rebuilding indexes?” Well, Justin, like I always say, if I’m going to say that there’s value in rebuilding indexes, I want very specific metrics on what value there is but if you’re running on SSDs there’s much less. And also if you just spend some money on memory you can always have your stuff in memory and, you know, where index fragmentation doesn’t matter? In memory – you know why? Because memory is RAM and RAM stands for random access memory and index fragmentation is – the problem usually is to find us having to randomly access stuff on disk, you hit a whole bunch of random I/O jumping around from one spot to the other. So just get as much memory as you can in a server, there you go, book, done, you don’t have to worry about the next fragmentation ever again.
How do you remember everything?
Wes asks, “how do you guys always have people’s comments from old blog posts, who wrote what and where and all this information?” We have nothing else to think about, this is it. Like back when I had a real job I would have to have like bookmarks and take screen caps and write things down, now that I have nothing else to think about I just remember this stuff now. It’s nice, my buffer cache is now just full of this stuff rather than having to think of stiff, like, what am I going to do at work today, how do I fix this problem… I just remember other things now.
How should I refresh a subset of data to dev?
M.M. asks, “is there a good resource for scripts to set up a job to store a subset of fresh prog data on the dev server, we don’t have enough room on dev to store full copies, we’ll likely be using…” Jeez, for that stuff I usually go third party. So not really built into SQL, at least not in a nontrivial way. Backup tools like Quest LiteSpeed or Redgate SQL Backup, you can do object level restores from them. So if you use them and you take your regular backups with them, you could restore a subset of that backup back to a dev server. I don’t know if you have to do any data cleansing or anything like that in between, but generally, for object level restore stuff like that I go with third party tools. I don’t want to mess with anything else.
Let’s see here, recent blog posts – Julius asks – yeah so if you type in Brent Ozar in Chrome you can just search the domain, look for a post called – you know what? I’ll just find it for you. Give me a minute to type, I’ll find it for you. It’s a post that Doug wrote, which I think has possibly the best flow chart ever created about index fragmentation. As far as in-depth detailed stuff about why, no, not really, just because it’s very situational and it requires a lot of demo work, so it’s not really great for a blog post. Brent is working on a full bore presentation on index fragmentation, so there’s always that.
Ernie, copy and pasting the same question in over and over again, my friend, is not going to get me to answer it. Please, please, please I’m begging you, post it on dba.stackexchange.com. You will have at your disposal, for free, many, many people who really want to answer your question and get those little green notifications; including a bunch of people who work for Microsoft that can tell about always on availability groups that you’ll love to hear. All sorts of people who write long extensive detailed answers to your question – post it over there, don’t be afraid. If you email me the link to your question to firstname.lastname@example.org I will go and I will upvote your question so you will get a special green plus five from me. Please post it over there. I’m not trying to hurt your feelings it’s just too much for this little thing, I’ve got to get stuff in that I can read and then come up with – I’m by myself so it’s even harder.
Should I put SSAS/IS/RS on their own VMs, or one?
M.M. has a follow-up, “for breaking out SSAS and all those things, which one would you suggest moving to the smaller VM, or more than one?” If you’re using all of them, like hardcore, you might want to have them all on their own VMs. If you’re using all three, sort of lightly or for different tasks and what not, you might be able to get away with moving them all to one machine as long as that one machine has a decent amount of resources. So, you know, you also have to take into account what you’re doing with it. For SSIS, if you’re moving 512GB of data or something ridiculous you’re going to need RAM to back that up. So just, you know, kind of be reasonable with what you’re asking each machine to do.
What I see a lot when I’m working with clients who have these outside resources all sitting on one box is there won’t be a lot of memory on there to begin with. And then what inevitably ends up happening is they’ll say, we hit this random slowdown every time we run a report, and so we’ll look at stuff and like, you know, we’ll look at stuff in memory and whenever – we were looking at like total server memory and target server memory and stuff like that. Whenever there’s this report being run, total server memory will bump way down, target server memory will budge down a little bit and SQL will all of a sudden have to start reading pages back into memory from disk, so page I/O latch waits will go screwing way up. So all sorts of fun stuff in there.
How should I merge indexes together?
Javier asks, “how to determine which indexes are candidates to merge using the include option?” Javier, where are you, my man? My friend, we have sp_BliztIndex, if you run that in mode equals four with your database, we will tell you duplicate and borderline duplicate indexes. For your specific use case, so they’re both going to be helpful here. So for duplicate indexes, those are indexes that have all of the same key columns but maybe different included columns. For borderline duplicate indexes, those are going to be indexes that have the same leading key column but then different key columns and then maybe even different includes after that. So if you run sp_BlitzIndex in mode equals four for your database you’ll get a list of all of the indexes you could ever want to possibly condense in that one place.
Just be careful, just be careful because you can end up with some pretty big indexes if you just go and consolidate everything. So, you know, you have to kind of figure out if you’re better off having a couple of smaller indexes or one big superset index. Usually, the one big superset index is a good choice but be careful, I would urge caution when doing that stuff, especially if you have to mess with key column order… sp_BlitzIndex mode four.
We’re just about at the 45 mark, me being by myself and having to eat lunch before I start up a call at one, I’m going to take off a little bit early, we’re out of questions anyway. Everyone have a great week, actually couple of weeks because I’m going to be in France, like I said, I’m going to be on vacation for the first time ever, so I will see you when I’m back, adios, good luck.