Blog

What If Week: SQL Server Code Review

Humor, SQL Server
8 Comments

Top Secret

For the sake of any lawyers out there reading, I want to start by saying we do not now, nor have we ever possessed SQL Server’s source code.

I wouldn’t even know what to do with it if I did — I doubt it’s written in crayon.

Heck, I’m not even competent enough to use a debugger, though I have tried. Bob Ward and Paul White just make it look so glamorous.

With that out of the way, let’s pretend we did, for some reason. Say something like:

  • SQL Server went open source
  • You got a job at Microsoft working on it
  • The source code leaked

And now we can pretend that we’re competent code readers and writers and start flipping through the pages (these things are like books, right?).

Topical Thunder

This week, we’ll run a series of short blog posts asking you, our dear and beloved readers, what parts of the code you’d want to get a gander at in different scenarios.

Why this series? Why this week? Because Brent is converting us into an offshore team we’re on a cruise in Alaska, and this is a nice time for you to discuss things among yourselves without moderation. Or moderators.

Bear with me

Hopefully WordPress sends them all out on schedule!


Abstracts Open for GroupBy December

GroupBy Conference
0

Want to present at the upcoming GroupBy December 1 & 8 online conference?

If your bucket list includes speaking at a major industry conference like the PASS Summit, then GroupBy can help you get there. Your Summit resume needs to include sessions at a few events, and GroupBy’s online video archives help you really stand out amongst other applicants.


How to Drop All Your Indexes – Fast

Sometimes I need to reset stuff during performance training classes. I know some of you teach classes, too, and some of you just like doing crazy stuff.

So here you go, a stored procedure to lose weight fast:

DropIndexes for SQL Server 2016 & Newer

DropIndexes for SQL Server 2008-2014

This one doesn’t include support for the @ExceptIndexNames parameter because I’m lazy:

Look, you’re the one who reads this blog, alright? Don’t blame me for the fact that you’re – STOP COPYING THAT, AND NO, DO NOT PASTE THAT INTO PRODUCTION.

No, this one’s not going into the First Responder Kit.


sp_AllNightLog: Creating Jobs Like I’m The President

Look, we need these things

The setup for sp_AllNightLog creates jobs for four separate activities

1 job to poll for new databases to back up (primary)
1 job to poll for new databases to restore (secondary)
10 jobs to poll for backups to take (primary)
10 jobs to poll for restores to… whatever (secondary)

And, as Mr. Swart notes: “In the land of Mordor where the shadows lie.”

The number of backup and restore jobs is configurable, but we estimate that having between 4 and 20 jobs should suffice for most RPO goals.

Naming things

Economical

All jobs are attached to the same schedule, called ten_seconds. Why? Well, if a job fails completely, we want it to restart. Most jobs run in a constant loop (While @Option = 1, BEGIN…), with a fair amount of error handling. Total failures should be rare. Having them all use the same schedule was an early design choice.

Along the way, we perform various checks to make sure that our target database exists — right now that’s hardcoded to msdbCentral, which helps easily tie using this into sp_BlitzBackups to trend RPO and RTO across servers. Huzzah! We also check that Agent is running, you haven’t attempted to insert an absolutely abhorrent backup path, and some other niceties that people often overlook. If there’s anything you think we should have in here, let us know!

Configurable

We really wanted to surface most settings. There are some table names that aren’t negotiable. You can change RPO, RTO, Backup, and Restore paths easily. Either update the tables directly, or use the Setup proc with the @Update action.

We also want you to be prepared for what’s about to start happening on your server: Either a Whole Mess® of backups or restores. That’s why the Agent jobs aren’t activated from the get-go. You’ll have to do that when you know you’re ready.

Optimizations

We really only had to make one tweak to sp_DatabaseRestore, to take into account log backups that have already been restored.

Unfortunately, when you run a dir command via xp_cmdshell, there’s not a great way to filter files based on a date. That would have made the insert to table variable more efficient, sure. But we settled, and we just delete from the table variable before a certain date. This means that for a directory with thousands of log backups in it, you’re not restoring headers for all the ones you’ve done already to see if you need to do them again.

We hope you find these stored procs useful. If there’s anything we can do to make them more useful or applicable, head on over to our GitHub repo to let us know.

Thanks for reading!


Out of Office: Time for the 2017 Brent Ozar Unlimited Retreat

Company News
7 Comments
“Now I just need a big enough drink glass”

For the next 10 days, we’ll still be publishing scheduled blog posts, but we’ll be responding a little slower than usual, and probably a little more relaxed as well.

That’s right – it’s time for our annual company retreat.

This year, we’re back aboard a cruise ship – this time taking the Ruby Princess from Seattle to Alaska, with stops in Juneau, Skagway, Glacier Bay, Ketchikan, and Victoria. It’s a route Erika and I have taken several times, and we’re excited to share it with the team & their spouses.

During this week, we’ll have limited access to our email, but if you’d like to follow along with our photos, check out my Instagram feed, or check out photos from my first Alaska cruise back in 2010.

In the meantime, we left this friendly guy in charge of support.

“Tell me why you need the SA password.”

Five Mistakes Performance Tuners Make

There’s no Top in the title

And that’s because a TOP without an ORDER BY is non-deterministic, and you’ll get yelled at on the internet for doing that. This is just a short collection of things that I’ve done in the past, and still find people doing today when troubleshooting performance. Sure, this list could be a lot longer, but I only have the attention span to blog.

So in a totally non-determinstic order, here they are, just like when you write a TOP without and ORDER BY:

Only testing indexes with the query you’re tuning

Number three and climbing the B-tree! You get a query that needs to be tuned, there’s a missing index request, case closed. Right? Well, aside from the fact that you shouldn’t always believe missing index requests, there’s a bigger problem.

You guessed it!

Other queries might decide to use that index, too, and they may not perform as well when they use it. Weird, right?

When you add an index, make sure you test out other important queries that hit the table you added it to, to see if their plans change. On the flip side, you could hit performance wins for multiple queries. In which case, you should do the Ickey Shuffle into your boss’ office and demand a raise immediately.

Only testing with one query running

Coming in at number five and looking alive! Did you know that query plans can be different on busy servers? I bet not! And aside from that, your performance problem might not even be the query itself, it may be blocking, or a poison wait. This stuff may not show up in Dev, unless yous spend a lot of time and money engineering load tests.

This is what safety groups call “Situational Awareness”, and this is the kind of stuff that you really want a monitoring tool in place for. Sure, that query ran slowly, but if that’s all you know, and you can’t reproduce it, then you need to start digging deeper.

Not testing modification queries

Number two is just for you! And this is one that a lot of people miss by a country mile. Maybe even a continent mile (I’m not sure if that’s longer or shorter than an incontinent mile). SQL Server has to keep all your indexes in sync, well, synchronously. When you issue DUIs, guess what? You need to maintain nonclustered index data right alongside clustered index data.

SQL may give you a wide plan where it shows you all of them, or it may bury them all under one “nonclustered index [modification]” operator. Fun, right? All those modifications need locks, and they may need spools, and it’s really just a whole ordeal. Eventually, maintaining all those indexes may start to slow write queries down, even for a single row. This should prepare you for number three, which is…

Testing with too much RECOMPILE

Four up from the floor up! So there’s like this whole Parameter Sniffing thing. When you use RECOMPILE hints, SQL Server generates a new plan for you, every single time. This may make it seem like your code or index changes are bulletproof, because every single time you run your query, it’s fast, and the plan is awesome. As soon as you pull out that hint, you may cache one of those awesome plans, and it might not be so awesome for another set of paramiablements.

Not testing with enough RECOMPILE

The Numero Uno That You Know! So you run your query with a set of values, and the plan looks good. Then you run it with another set, and they get the same plan. It’s still fast; lucky you! But uh… what if that second set of values goes first? Do you get a different plan? Is it still fast? What happens if another set of values gets that plan?

Oh dear, you’ve shot the other foot.

Honorable Mention

I know, I know — that’s a lot to absorb all at once, and you’re feeling like you need to call old employers the way you called all your exes that time you had an ingrown hair that you weren’t sure was an ingrown hair to let them know they may have a problem that needs to get checked out. Don’t worry, like reading a WebMD article about headaches, it’s about to get worse.

  • Testing With Local Variables
  • Not testing in an environment that mimics production. What does that mean? Well, if you’re using Log Shipping, Mirroring, or AGs, all those unbatched DUIs might behave a whole heck of a lot differently once then squeak out into production.

Thanks for reading!

Brent says: another one I’ve been seeing lately is running DBCC DROPCLEANBUFFERS to “clean out memory” in between query runs. People think they’re testing more accurately because data isn’t in RAM, but that may not mimic production. In one example, the tests didn’t show any performance difference between query versions – but it was because the storage was so doggone slow that any query was horrible.


[Video] Office Hours 2017/08/02 (With Transcriptions)

This week, Brent, Erik, and Richie discuss unit testing, tSQLt, availability groups, duplicate indexes, reporting services 2016 on availability groups, troubleshooting Entity Framework queries, joining views in tables, SQL Server caching issues, and why procedures have plans inside the cache.

Here’s the video on YouTube:

Office Hours Webcast - 2017/08/02

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.

Leave us a review in iTunes

 

Office Hours Webcast – 2017-08-02

 

How do I sync settings across SQL Servers?

Brent Ozar: We’ll start with Rowdy. Hello, Rowdy. He says, “Hi folks, I’m migrating off a well-designed…” He’s only saying that because he doesn’t want to insult his predecessor. “Agent availability group, it’s SQL 2012 on Windows 2008 R2, it’s new hardware and a new OS  but unfortunately it won’t involve a SQL Server version change.” He says, “Are there any tools you’d use to check to make sure that all the settings and users and jobs are identical in the new availability group before cutting over?”

Erik Darling: Well, knowing Rowdy, I’m surprised that he hasn’t stumbled on DBA Tools for that because I know that they have instance comparison functions for SQL Server. I’d be surprised because I know he’s a PowerShell guy.

Brent Ozar: Erik and I were talking about this as part of the Faux Pas Project, the thing that we’re doing – these days it’s log shipping at scale but for a while, we were talking about building availability groups at scale. And the idea was could we conceptually script out all of the contents of a server, like all the metadata, trace flags, sp_configure settings, dump all that out into one file and then use it to import again into another one. It’s not out there; like the functionality to do it just isn’t there. There’s nothing to compare two servers identically. The closest two things that I’ve seen are, a Redgate schema compare will check and see the differences between DMVs, if you want it to, but of course, it’s going to produce all kinds of false positives that are just garbage; false change differences that are just garbage. Or Kendal Van Dyke wrote this SQL Power Dock that dumps out a lot of configuration into Excel. It’s built atop PowerShell. Kendal hasn’t maintained it for at least a year or two. He went to work for Microsoft and just stopped working on the project…

Richie Rump: People get busy at Microsoft, Brent, come on now.

Brent Ozar: I actually wrote a blog post about that this morning, it’s very funny how that works. But yes, so that’s a way you could go down and start, but I haven’t seen anything that catches all the trace flags, sp_configure settings, master database objects…

Erik Darling: I don’t even think there’s a good framework for it. Like, if I had to do it from a purely tSQL point of view, I couldn’t see creating a linked server or creating synonyms to all the DMVs that point to the other server and then just doing like an intersect or accept query that would catch things that are different between the two. That’s a nightmare because I would miss something; I would miss lots of things.

Brent Ozar: And even that, we’re talking about just differencing scripts. We’re not even talking about application either, you’d have to apply the differences too, and that’s even harder. Rowdy says, “I have found DBAtools.io…” So you know what I would do is, DBAtools.io has a super lively participation group in Slack. I would ask over there because I think – I’m sure that they don’t have an answer either, but I bet that they know people who would be interested in building that kind of thing, just long term.

 

Is there a benefit to duplicate indexes?

Brent Ozar: Let’s see, next up, Hannah says, “Is there any reason that I should want to keep duplicate indexes? I’m a junior DBA at a new software development company. I’m only working with developers and I’m the only DBA. They create stuff nobody has ever gone back to review or maintain. We have thousands of duplicate indexes in an instance, some because of uniqueness, some clustered with nonclustered indexes, some because there was a typo. So is there any reason I should keep those?”

Erik Darling: She mentioned a couple of the reasons that I think would be good if they were enforcing referential integrity or uniqueness. You know, if they were backing up a foreign key or something – but exact duplicates, where you see one has all the reads and the same amount of writes as another one that has none of the reads, then I would just start disabling those.

Brent Ozar: There’s one gotcha, which is if some developer has named the index in an index hint, their query will simply break. It won’t adapt and use another identical index, it will simply not work. Me, I look at that as a benefit, because you shouldn’t be hard coding index names inside your queries. So just if you do it, just disable the indexes rather than drop them. The only reason I say that is because then that way if somebody runs in and says my queries are all failing, then you can do an alter index rebuild and it’s hydrated back up in a relatively short period of time.

Richie Rump: Yeah, I love that problem, because us devs, we take a look at it, we run in a query and all of a sudden it says, hey you may be missing this index. And then we’re like, oh okay, let’s go ahead and throw that on there; let’s throw that shrimp on the barbie, I don’t know what it does, we’ll just go ahead and do it. And sure enough, it probably was there already, it just didn’t use it at that point and here we go, duplicate, duplicate, duplicate, duplicate.

Erik Darling: Now you have 40 indexes called, name of missing index SNAME.

Brent Ozar: So I go to a client and go onsite and I’m in the room with the DBA. They’re having all these performance problems, and there’s nobody in there, just me and the DBA. One of the things I run is sp_BlitzIndex and we’re looking at it and one of the tables has 174 identical missing indexes. 174 indexes, not like on different tables, 174 on the same table. So you try to keep the poker face as a consultant and I’m like, hey so what’s up with this? Can you tell me a little bit about where we got to these 174 indexes? And the developer says, well when someone brings me a slow query, I go look at it, figure out – it says indexes are recommended and then I go and add those indexes. Again, I’m trying to keep the poker face and I’m like, well but all these are identical. He says, “Yeah it always recommends that I create an index, but the name already exists; I just figured it’s the name because you’re using tools like the DTA and the missing indexes [crosstalk].”

He would just put ASDF, ASDF, you know, bang on the keyboard at the end of the index name to give himself a name. I said okay, so where are you doing this at? Where are you doing your index tuning? He says, “I’m doing it over in development. I know better than to do it over on the production environment.” And I said, “When was the last time you refreshed development from the production environment? When was the last time you restored it over?” And he stopped and thought and he goes, “I don’t think I ever have.”

So the poor guy was repeatedly recreating exactly the same index because it was missing in dev and never actually bringing it over from production. You’d be amazed at what you see out there. The good part was is I said, look I’m explaining to you what’s going wrong. It’s just you and me in this room so what we’re going to do is drop 173 of these indexes and we’re going to go back and tell your boss that we found the magic go faster button and everyone’s just going to love you to death; and they did. Everybody learns somehow.

 

How can I easily troubleshoot EF queries?

Brent Ozar: Next up, Grahame says, “I’m troubleshooting entity framework generated SQL queries. Is there a methodology or a tool I could use to make this easier? I’ve suggested that the developer start using LINQPad, but is there something else?

Richie Rump: I guess that’s my question, right? There are a couple of monitoring tools out there; I can’t think of them off the top of my head right now…

Erik Darling: ANTS, Redgate ANTS. Just kidding…

Richie Rump: Well ANTS is for memory and some other stuff. If you’re running through it, the code, it will actually give you times. Like writing your code and you’re debugging it, how long this ran and how long that ran. If you’re dealing with specific entity framework problems, it’s not going to tell you what the query was or anything like that. It’s just going to tell you that this function that you ran it in ran in 30 seconds. And then it will tell you, probably, the query in itself ran for 29 seconds. That’s what ANTS will give you. I love ANTS, I think ANTS is a great product.

There are a couple of Entity Framework tools, and it depends if you’re using Core versus Entity Framework 6; they’re essentially different products that you could go and actually monitor stuff. I’ve used them. If I needed to do an EF thing right now, I probably wouldn’t. I’d probably just go into dev, run profiler and start grabbing some of the queries from there and just start tuning some of that stuff. Probably actually run sp_BlitzIndex or something and start taking a look at the indexes, because that’s probably what it is. That’s where I go first, then I start looking at the queries themselves and see what’s going on inside of them. BlitzCache is another good one for that, if I start taking a look at what’s going on inside of the cache and where my queries are and how long things are running.

That’s probably where I would start, but there are some monitoring tools out there that you can use for Entity Framework to kind of see what’s going on inside and what’s taking so long. It’s probably some really complex queries that’s generating some suboptimal SQL and you probably just need to back away from that and use a stored procedure, or figure out what exact data you need in that connect query.

 

Any gotchas to adding SSRS to an AG?

Brent Ozar: Let’s see, Matt has a follow-up. He says, “I already have a 2016 availability group for a production website and this will just be on the same group of servers but in a different AG.” Yeah, in that case, I don’t think we’ve got anything since we don’t do reporting services work. I would just read the books online, stuff on adding availability groups in for reporting services because I want to say that there are some gotchas there. But we don’t do a whole lot of that, so you got me there.

 

How many cluster nodes can go down at once?

Brent Ozar: Gordon says, “I’m somewhat confused as to the minimum number of cluster nodes that need to be online for the cluster to still be up.” And he’s got a whole bunch of information down there about his specific scenarios. He’s got “Two nodes and a file share witness with dynamic quorum enabled, what happens if I lose one node, what happens if I lose another?” Now, unfortunately, this is beyond the scope of what I can answer fast. I actually have a whole 30-minute session on it in the senior DBA class, where I walk through the problems with dynamic quorum and dynamic witness. In theory, if you’re lucky and the right node owns the vote at the time, you can get all the way down to last man standing. If you’re unlucky, in a two node plus witness configuration, and the wrong node goes down, then you can’t get down to last man standing. So that’s why things are a little bit tricky there in terms of the articles you’ll see out there. Some of them just assume that the right node happens to own the vote. Some of them explain it, if you don’t have the right node owning the vote, you go down hard.

 

Are views slower when joined to tables?

Brent Ozar: Don asks, “I have a query that’s joining a view, then joining a table to add another column that probably should have been added in the view. Performance is terrible. Is there a known issue with joining views in tables?”

Erik Darling: No, there’s not a known issue with views in tables, but if those views call other views, like if you have nested views, you could end up with a whole weird thing. What I would want to do is start with the query plan for that and I would want to start with taking a look at that to see if there’s anything obvious jumping out at me there for why this might be so god awful slow if you’re joining just to add one column in. I know this is going to sound weird and hand wavey, but every time there’s been a case with me as a DBA where a developer has said, I’m just going to add this one column in. Two days later, this query is still running with, like, this awful key lookup and I’m sitting there like, yeah you just added that one column in, thanks, boss, saved the day.

Richie Rump: Yeah, I love the whole views on views things, because as a developer, that’s kind of what you want to build, right. Hey, I’ve got this one little piece of code and I’m going to build it on top of that, and now I have this, you know, flexibility. So if something changes here, it automatically goes up here and – all of a sudden I start nesting all these views and it makes great sense, right, from a developer’s perspective. But for performance, man, it’s just dog, it is terri-bad, just don’t even do it; don’t even think about it.

Erik Darling: I’d really want to get the difference too between the query that the view produces on its own versus the query that it produces when it’s joined to the other table because that could also give you some unique insight into what’s changing underneath.

Brent Ozar: It can be – judging that you’re asking this question too, what you probably want next is help from someone looking at the views or looking at the execution plans. This is why we made PaseThePlan.com, and by we, I mean Richie. So PasteThePlan.com is where you can go paste in execution plans on our site and then they’re uploaded and shared with the public. Now, the first thing you need to know is these are now public execution plans. Anyone can see them. So if you have stuff in there that the company’s kind of ashamed of or social security numbers, credit card numbers, you probably don’t want to paste the plan.

If it’s totally public and you don’t mind sharing them, then put both of those two execution plans on there, upload both of them and then go to DBA.StackExchange.com. And at DBA.StackExchange.com you can ask questions about SQL Server that span multiple paragraphs, include the links to Paste The Plan and people will go through and look at those differences. If your query plans are private or you’re not supposed to show any of that data, go get SQL Sentry Plan Explorer. [crosstalk]

Erik Darling: SentryOne, rebranding…

Brent Ozar: They’re going to be cursed with this for years. SentryOne’s Plan Explorer is totally free and one of the features that is has is the ability to anonymize your plan. So it takes your execution plan, replaces all the tables with object one, object two, object three. Now from there, you can take that execution plan and post it on Paste the Plan. But here’s the thing, you don’t want to bother doing that, because no one wants to read your hideous execution plan once it says things like object one, object two, field one, field two.

So instead, at the top of SentryOne’s Plan Explorer, there’s a button that says post to SQLPerformance.com. This is where people, smart people like Paul White and Aaron Bertrand are sitting around waiting for your execution plans. And because they’re paid to do it, they are much more likely to look at that heinously ugly execution plan through object one, object two, object three. They can give you advice, but know that their advice is going to be things like, you should join object three field 60 to object two field 11 in order to improve your cardinality, and it just won’t be as easy as having somebody look at the real plan.

Erik Darling: And to be fair, they are also sitting around doing the same thing on DBA.StackExchange.com for free, so the advice is way worse, because it’s free, so you really can’t ask for a refund on that. Paul gives great answers, for free.

Richie Rump: Crazy great answers. It’s like whoa, I need to clear my mind, go do some yoga, read it, go do some more yoga, meditate on what he just wrote… Yeah.

Brent Ozar: I have read whole book chapters that are easier to get through than one of Paul White’s Stack Exchange answers…

Erik Darling: But you also learned less.

Brent Ozar: By far, yeah, yeah. Whenever I see Paul White has a new answer that’s more than a page long or he posts a blog post, I’m like alright, let me clear out a spot in my calendar here because I’m going to need to focus for a while.

Erik Darling: Pressing the print button…

Brent Ozar: Can I bring this with me to the plane?

Richie Rump: You’ve got four different color pens, you know, and writing notes on it.

Brent Ozar: I find stickers to be really helpful, you know, stickers with little arrows, maybe a unicorn for the good parts.

Erik Darling: I hired some paralegals to just go through and make sure everything – cited it and annotated it and put headers and footers on all these things. I don’t feel dumb…

Brent Ozar: I still feel dumb when I finish but – no I feel smart enough.

 

Why does the plan cache use so much memory?

Brent Ozar: Kelly says, “I have one SQL Server instance, 2012 Service Pack 1…” Well first off, you should get on a supported Service Pack, because that’s not even supported anymore. “Where the optimizer accumulates and holds memory over time, maybe weeks, until there’s very little memory available for database caching. I’m seeing this via IDERA SQL Diagnostic Manager. When I restart the instance, that resolves it, but then it just restarts again accumulating more cached plans. What should I do?

Erik Darling: Stop restarting your SQL Server instance. So that sounds like a pretty natural way that SQL Server works to me. Over time you read more things into memory and all of a sudden you’re fully cached, or at least you have as much stuff cached as you can fit into memory. I’d probably want – Brent has a good article on that, it’s called the Sysadmin’s Guide to SQL Server Memory.

Brent Ozar: That’s true.

Erik Darling: That’s true, he does have that. I think, that sounds like the tree you’re barking up, where SQL Server keeps using more and more memory and you don’t like it.

Brent Ozar: I wonder, when you say that SQL Server has little memory available to cache data, I would question, what’s the problem there? Are you seeing page I/O latch waits? Page I/O latch waits means that SQL Server has to go out to storage in order to pull data back and it’s waiting for that. If you’re looking at page life expectancy and going, this is way too low I have a problem, don’t do that; that’s not the real problem there. I’m going to give you a truly horrible answer too, if you wanted to reduce the amount of memory used to cache execution plans – is anybody looking? You could run DBCC FREEPROCCACHE, which will clear out all the execution plans stored in memory. This is a spectacularly bad idea, but if you were determined to do bad things, that would be a bad thing that you could do.

Erik Darling: Also, I’d be concerned about what IDERA Diagnostic Manager is telling me, because SQL will only reserve so much space available for caching query plans, and it will evict those plans quite frequently. You see all sorts of memory pressure stuff coming in. So I guess what I would be concerned about, maybe, is maybe I have a query that’s asking for a gigantic memory grant that’s clearing out the buffer cache and the plan cache. But still, SQL is pretty aggressive about the way it chooses to cache data where it points to – you know, it will want to cache data for queries that are going to read stuff versus where it’s going to cache stuff like execution plans or DMV info or other things like that.

Brent Ozar: Brian Maynard also says, “Maybe check ad hoc plans and use optimize for ad hoc.” Yeah, that will reduce the issue. I would just kick back and go – I don’t think it’s really an issue at all.

Richie Rump: Maybe the answer is, what’s the problem?

 

Why does CHECKDB cause a restart?

Brent Ozar: Tami says, “Hi…” Hello Tami, “I have a SQL Server 2014 development instance. When I run CHECKDB on a 25GB database, it uses up most of the memory then crashes the virtual server and requires a reboot.” First, I would step back right there and go, any time something crashes, that’s probably not a SQL Server bug, there’s probably something else going on, or even a SQL Server workload issue. The crash, I might even think about opening a support call with Microsoft, because a crash shouldn’t happen regardless.

She goes on and says, “The server has 16GB of memory and about 13GB of it is allocated to SQL Server. If I reduce the allocated memory a little bit, it doesn’t crash. Does it seem right that DBCC should be able to cause this issue and is it a bug?

Erik Darling: So the answer is yes, and I actually dealt with this at my last real job where I had a server with 521GB of memory, and when I had max server memory set to 485GB, DBCC CHECKDB on a 4.5TB database would crash the server. When I set it to 475GB, it would run fine.

Brent Ozar: Get out of here. Because it got a different memory grant based off that or?

Erik Darling: I think – honestly I don’t know what it was. I didn’t have the memory grant saved from one to the other, but I’ll just tell you, that 10GB of max memory changed everything. I want to say that Windows hit some breaking point where it was just like, I can’t manage this, and quit and that was it. Because it wasn’t SQL Server crashing, it was Windows.

Richie Rump: So Windows rage-quit, is that what you’re trying to say?

Erik Darling: Basically.

Brent Ozar: She says, “If I set it to 12.2GB it doesn’t crash.”  There’s two things I would say. Make sure you’re fully patched because there was a known bug around CHECKDB’s memory grants.

Erik Darling: That’s old though…

Brent Ozar: The other thing I would say is, this is a laptop from 2012, circa 2012. It has 16GB of RAM on it. I don’t know that a VM with 16GB of RAM – I don’t know that I would go troubleshooting that much further if you’re having memory issues. I would go add RAM, and if you’re really hard up, I’ll loan you some at a conference, because this isn’t even a recent laptop…

Richie Rump: Yeah, 2013, 32.

Erik Darling: The one on my desk, 32, the one under my desk, 128.

Richie Rump: My tablet, 16.

Brent Ozar: My phone, 128… No that’s [inaudible]. My thumbnail, a micro SSD card.

Erik Darling: My water bottle…

Brent Ozar: In my pants…

 

How can I learn to troubleshoot replication?

Brent Ozar: Marcy says, “SQL Server transactional replication is so freaking hard to understand. Are there any good websites to decode it for troubleshooting?”

Erik Darling: TaraKizer.com…

Brent Ozar: You know, I don’t even know if she owns that.

Richie Rump: Well I do now…

Brent Ozar: If we weren’t on camera I would totally go buy that. I would just call Microsoft in terms of support. There are a couple of good books on Amazon, Hillary Cotter had a book, I want to say Kendal Van Dyke had a book on it as well.

Erik Darling: There’s a free one from like Redgate or something. Like Understanding Transactional Replication, I want to say. I started to read it like two or three years ago because I said, I’m going to put my nose to the grindstone and learn about replication because customers keep asking. And then I got like three pages in and said I’m not learning about transactional replication, you’re going to hear about something else instead.

Brent Ozar: It’s really funny, the people who wrote those books even have all gone on to other things in their career. There was a while there when one of the authors was emailing us and going, can you please take all replication consulting from me going forward? I was like, nope, not interested.

 

Why does my SQL Server use all of its RAM immediately on startup?

Brent Ozar: Matt says, ‘I have a server that’s got 32GB of RAM…” Like both of the laptops we just saw. “SQL Server is configured to use 24GB of RAM. When it starts, it consumes all 24GB. Is there a better way to configure memory for SQL Server? Is there a way to not have it consume all 24GB when it starts?”

Erik Darling: Why would you not?

Brent Ozar: I don’t know, I’m just the questioner here. Maybe he wants to save it for later. It’s like when I get a big [inaudible] from Wishbone Café down here in Chicago. [crosstalk]

Erik Darling: You could set max memory lower and then slowly increase it as it’s up longer, but I wouldn’t.

Brent Ozar: That’s cool…

Erik Darling: No it’s not.

Richie Rump: Maybe you could set an alarm for every hour that you could actually – every time it’s up, you could increase it 10MB…

Erik Darling: Or get one of those, like, bird things that goes like this and just hits F5. So like, every once in a while it will increase max memory.

Richie Rump: Maybe you should write a PowerShell script to do that, Erik.

Erik Darling: Yeah, that’s – I’ll get right on that, and I’ll see you guys in two years.

Brent Ozar: I would say, you may check to see if you’re using any trace flags or if you’re using lock pages in memory. There was one trace flag that involved enabling large pages. I want to say, for some reason, it was 834, I can’t quite remember, which causes you to allocate all memory on startup, which is by design…

Erik Darling: And could cause slow startup on servers with a lot of RAM. But Matt, you are not in that danger zone.

Brent Ozar: Tami who had the 16GB of memory says, “I’m being memory shamed?” Hey listen, it was much worse on my call this morning when somebody had a 1TB database and had 80GB of RAM. I was like look, here’s the deal, my desktop has 64, Erik is showing pictures of his desktop with 128…

 

 

Brent Ozar: [Tashal] points out that the free book that Erik mentioned is called Fundamentals of SQL Server 2012 Replication.

Erik Darling: There it is, written by [Tashal].

Brent Ozar: By all means, even though it says SQL Server 2012, it’s still good for 2014, 2016 and 2017, because Microsoft hasn’t put one peso of development into improving those features.

Erik Darling: Me either.

Brent Ozar: Me neither.

 

More about my ginormous plan cache

Brent Ozar: Kelly says, following up on her IDERA SQL Diagnostic Manager and plans question, she says, “Following up, most of the memory gets pinned to the optimizer, FREEPROCCACHE has no effect and ad hoc is enabled.” We’re hitting the end of the call, but the thing that I would dig into for follow up, because you could post this to Stack Exchange if you wanted. There’s a great place for that. Also include the total amount of memory on the server and the total database size. If you look at sp_Blitz, there’s an output type from markdown. Markdown is the question format that Stack Exchange uses, Google for sp_Blitz markdown and there’s instructions on how you can include that output on your Stack Exchange question, that will save a whole lot of back and forth with people asking for more details.

Erik Darling: You know, I’d probably also want to go to IDERA and figure out exactly what that means, because when you say something like memory gets pinned to the optimizer, I’m sort of baffled by like how memory gets pinned to the optimizer. Because the optimizer is not really something that memory gets pinned to, it will take a little bit of memory to compile a plan, but it’s not a memory [crosstalk], yeah.

 

Why would a proc have multiple cached plans?

Brent Ozar: Alright, and the last one we’ll take, Rowdy says, “I’ve been working on getting intended execution plans published to our development team. One of the things I’ve found when I was exporting execution plans from cache is that some of my procedures have multiple plans in cache. Why would a procedure have multiple plans inside the cache?”

Erik Darling: Lots of reasons…

Brent Ozar: Like…?

Erik Darling: So many reasons. Different white space, different things getting passed in. God, there are so many; I’m trying to think of a good one, though.

Brent Ozar: My favorite good one is – so say you have the same stored procedure and it’s called by – say one is by SSMS and one is by your app server, if you have different ANSI NULL options ARITHABORT settings, anything about your session, you’ll end up getting different execution plans cached.

Erik Darling: If you have the same stored procedure in multiple databases you’ll have different ones.

Brent Ozar: Even if it’s the same procedure in the same database, if you call it from different databases, you’ll get it. If you call it fully qualified from, say, tempdb and master and a user database, you’ll get it. And the epic blog post on this is Slow in the App, Fast in SSMS. This comes up at least once at every Office Hours, and frankly, I would say folks, any time you’re troubleshooting any kind of issue with SQL Server, whether it’s a failing cluster, always on availability groups broken, 2005 no longer under support or a query performance problem, you should start with Erland Sommarskog’s excellent post, Slow in the App, Fast in SSMS. And if you Google for that, it’s about 50 pages long, has a table of contents, but it makes your hair fuller, it brightens your skin…

Erik Darling: It’s a testosterone booster, it’s everything, it’s just amazing.

Brent Ozar: Makes grown men cry, makes young men cry, makes women laugh, yes. Well thanks everybody for hanging out with us this week at Office Hours and we will see you guys next week, adios.

Richie Rump: Can I go back to Postgres now, please?

Brent Ozar: You can go back to Postgres now.

Erik Darling: Get out.


Chaining Agent Tasks For Alerting The Right Way

Step By Step

When you think about setting up maintenance, a persistent fear is that tasks may step on each other. You probably don’t want backups and CHECKDB and statistics maintenance running at the same time as statistics maintenance (notice I didn’t say index maintenance, here, because you’re smart and you skip it?). You may also throw in some other steps, like cleanup tasks something.

Your next fear is that if one step fails, you won’t go to the next step. For instance, a Sort from Statistics maintenance may fill up TempDB, a drive for backups may run out of space (or the network path could get yoinked), or there might be some (DEARPLEASEGODDONTLETITBECORRUPTION) error associated with DBCC CHECKDB (like one of those weird NTFS fragmentation errors) or something.

At no point should one step failure lead to another step not running, so you do something like this.

On failure report success: Government work at its finest

The trouble is, when a step fails, there’s no way for you to get a notification that a step failed (unless you write some custom code within the job).

Bummerino mi amigo.

Soldier Forth

In The Know

The smart thing to do is use built-in stored procs to call other jobs.

Pants: On

This allows you to retain control over job run order, and still get emails when individual job steps fail, because each job can report and alert on failure.

Update Antony correctly points out that this will go out and run your jobs asynchronously. There’s a second part to this where I discuss how to control that further.

Thanks for reading!


Locking When There’s Nothing To Lock

Demo Day

We use StackOverflow for demos a lot. For all the reasons Brent mentions in his Great Post, Brent©, it’s pretty awesome.

Where things get tricky is with locking demos.

Sometimes the modifications can take a long time. This may be by design if you need to show long-held locks by sessions that aren’t sleeping.

Sometimes rollbacks can take a long time. After all, they’re single threaded, and you know how I feel about things that are single threaded.

The No-Row Update

To get around this, I decided to try some updates that didn’t actually update any rows. The WHERE clause excludes everything in the Users table.

I thought that SQL would outsmart me here, but it didn’t. It didn’t even try.

Here are two queries, neither of which will have any qualifying rows in my 2016/03 copy of the database. (If you’re using a more recent export, choose a more recent LastAccessDate.)

The only index on the Users table is a PK/CX on Id. That means it’s the one way in or out of the table for any queries, no matter what they’re trying to do.

So if I run an update that will also hit 0 rows, it has to use that.

No Holds Barred

While that update is running, I’ll try to run my query with the WHERE clause on LastAccessDate. It’ll get blocked — and we can confirm that by running sp_BlitzWho.

Low fiber diet

Rollback, Repeat

So look, clearly we need an index.

Maybe we need 75 indexes. If we do, we’ll run DTA and then set our pants on fire in defiance of all humanity.

But in the meantime, let’s try adding one.

This gives our select query a separate access path. Sort of. I know what you’re thinking — that SELECT * is a trick! He’s messing with us. The clustered index needs to get updated, so the key lookup will get blocked. Ha ha ha. Nice one, dummy. We’re smarter than you.

So I’ll do this instead.

This query only needs columns in our nonclustered index, so it won’t get blocked. Right?

Right?

Wrong.

The answer lies in sp_BlitzWho’s much cooler older brother, sp_WhoIsActive.

Our update query is locking the whole darn thing. Even though it changes 0 rows, and the query is done running (though it’s in a BEGIN TRAN), we’re still stuck with this lock.

I learned it from watching you!

So what to do?

Our clustered index doesn’t help, and our nonclustered index doesn’t either. Maybe our update needs an index?

It’s simple enough. We’re updating Reputation, and filtering on Reputation. Maybe we need an index on Reputation, too?

We have some options.

A separate index that leads with Reputation, and index that leads with LastAccessDate, and an index that leads with Reputation.

For the sake of blog post brevity: the index that leads with LastAccessDate doesn’t fix our blocking scenario.

So what’s better: Two separate indexes on (Reputation, Id) and (LastAccessDate, Id), or one index on (Reputation, LastAccessDate, Id)?

And then what if our UPDATE changes? What if we need to filter on LastAccessDate? Or something else? What if having LastAccessDate as the second key column makes our SELECT queries eat speed bumps?

You can’t index for everything, and as we sort-of-glossed-over, leading index key columns seem necessary to get us out of locking jams.

Sure, you could get around things with NOLOCK hints, but if that makes you queasy, it’s time to start looking into other Isolation Levels.

Indexes can be extra tricky. If you’re lucky enough to be going to PASS, be sure to hit up Kendra’s session on indexing.

I’ll probably be there, making sure you don’t start texting.


Why Missing Index Recommendations Aren’t Perfect

Indexing
6 Comments

Using the good ol’ Stack Overflow public database, get the execution plan for this query – either estimated or actual, doesn’t matter:

In the execution plan, SQL Server asks for a missing index on the LastAccessDate field:

Missing index #1 on LastAccessDate

If you right-click on the plan and click Show Execution Plan XML, you get more technical details:

Missing index #1 XML – equality

SQL Server’s telling us that it needs an index to do an equality search on LastAccessDate – because our query says LastAccessDate = ‘2016/11/10’.

But in reality, that’s not how you access datetime fields because it won’t give you everyone who accessed the system on 2016/11/10 – it only gives you 2016/11/10 00:00:00. Instead, you need to see everyone on that day, like this:

The missing index is just a little bit different already:

Missing index #2: now with Id field included

Huh. Now SQL Server is recommending that we include the Id field. Technically, Id is the clustering key field on the Users table – so why did SQL Server tell us to include it this time? God only knows, and by God I mean the people on the query optimizer team, because in my eyes, they’re gods.

Let’s see the hand of God by right-clicking on the query, click Show Execution Plan, and:

Missing index #2: inequality on LastAccessDate

This time around, SQL Server’s saying that we’re doing an INequality search on LastAccessDate. When you look for a range of values, you get an INequality index recommendation.

What happens when we use NULL and NOT NULL?

Are those considered equality searches? Well, I wanted to show you that with LastAccessDate just to keep the demo consistent, but there’s something amusing about the way the LastAccessDate field is created with SODDI: it’s not nullable. As a result, SQL Server is way too smart to ask for an index on that field:

Is NULL or is NOT NULL

Instead, I’ll use the nullable Location field. In both cases, SQL Server asks for an index on the Location field:

Indexes on nullable Location field

As you might expect, the IS NOT NULL query’s missing index is an INequality one, since we’re not doing an exact value match:

Inequality index on location

Whereas the IS NULL is considered an equality search, and gets a separate equality index recommendation:

Equality index on location

Note that I got two different hints in the XML because my two queries were separated by a GO. When I ran only one batch with both queries in it, and no GO, I got a single missing index with INequality.

What happens when we filter on two fields?

Here’s where understanding the difference between equality and inequality starts to pay off:

The LastAccessDate is an EQUALITY search, while Location is an INEQUALITY search:

The missing index XML

And when you look at the missing index hint on the query, the equality field is first:

Seen on the side of a milk carton

This recommendation makes perfect sense for this query because LastAccessDate is very selective: it’s going to filter our query results down to nearly no rows. Generally speaking, you want the most selective field first in your nonclustered indexes. The IS NOT NULL search on Location isn’t selective at all – it brings back millions of rows – so you want to perform that filtering second.

But what happens in the opposite scenario – when the equality search isn’t all that selective at all?

In the query plan’s XML, the missing index hints show that we’re doing an equality search on Reputation, and an inequality search on Age:

Equality on Reputation, inequality on Age

But here comes the curveball: there are millions of users with only one reputation point, but only about 1,000 users with Age between 90 and 99. Age is way more selective in this case. So what does SQL Server Management Studio recommend for our missing index?

See the full details at SadTrombone.com

The first index recommendation bombshell:
the index fields aren’t in order.

SSMS is following three out of four of Books Online’s guidelines for human beings to interpret missing index DMV data:

Don’t be sad, cuz three out of four ain’t bad

It’s up to you, dear reader, to do the fourth bullet: order the fields based on their selectivity. SSMS can’t do that when it’s rendering a query plan, because that would require querying your database. sp_BlitzIndex suffers from this same limitation, too, as does every single tool that uses the missing index DMVs.

I’m not mad at Microsoft, either: they do a great job of laying out the issues on the page Limitations of the Missing Indexes Feature.

In this case, the ramifications are no big deal: if we create SQL Server’s index on Reputation, then Age, it can seek first to Reputation = 1, then seek to Age = 90, and it gets the job done in just a handful of reads. However, in real-world, 3-4-5 key indexes, with a mix of equality and inequality predicates in your query, plus grouping, the difference can be staggering.

When you see a missing index request, don’t think of it as a request to create a specific index. Think of it as SQL Server tapping you on the shoulder, going, “Hey, friend, take a look at your indexes.”


First Responder Kit Release: Yukon Ho!

We’re going to Alaska this month, so try not to find any bugs, okay?

To celebrate, I’m channeling childhood nostalgia, and quoting a verse from one of my many Taco-Bell-dripping-stained Calvin and Hobbes tomes.

We’ll never have to clean a plate,
Of veggie goops and goos
Messily we’ll masticate,
Using any fork we choose!

When I was a kid, the word “masticate” was always good for a chuckle.

Until I found a dictionary.

You can download the updated FirstResponderKit.zip here.

sp_Blitz Improvements

  • #962@RichBenner – the man, the myth, the guy who cares about trace file names. Thanks to him, you’ll never have to worry about one being folded, spindled, or mutilated.
  • #1008: We added a debug mode. Actually, we added two debug modes. Using @Debug = 1 will print out CheckIds as they run, and @Debug = 2 will print out CheckIds and dynamic SQL used in some checks. This will help us identify errors and slow-moving queries. You’re welcome.
  • #1023: We noticed that there were some undeclared character lengths, and fixed them. Who cares, right?
  • #1029: We’ll only give you warnings about CMEMTHREAD when you have multiple NUMA nodes. Spiffy!
  • #1041: When we look at DBCC commands that have been run in the default trace, we ignore anything run by common monitoring tools. By “common” we mean “good”. If your monitoring tool isn’t on the list, perhaps this is a good time for silent reflection about the choices you’ve made.
  • #1042: The only thing worse than knowing fill factor is low is not knowing how low it is. Now you’ll know, and you can begin the long march to the bottom of your Desk Whiskey® bottle.
  • #1043: Endpoints; ah, Endpoints. Those verdant fields. It’s fine if sa owns them. Now you’ll know that too.
  • #1049: The memory dumps check was missing a BEGIN and END. Using advanced control-flow technology, we can now avoid running syntax when our Super Cool Consultant Criteria® isn’t met.

sp_BlitzCache Improvements

  • #1030: We’ll give you a holler if there’s a high variation between the rows estimated to come out of a query’s SELECT operator, and how many rows are returned on average by a query. This is v1, so if you run into anything weird, please let us know!

sp_BlitzFirst Improvements

  • Nothing this time around

sp_BlitzIndex Improvements

  • Nothing this time around

sp_BlitzWho Improvements

  • #1017: Not all blocking would be flagged. Sometimes none of it would be. The whole thing stunk! Stunk! Like the way those zit popping videos probably smell.
  • #1031: DMVs are awful. Has anyone else noticed that? One day you’re cruising along with a positive elapsed time, the next day SQL Server decides you’re a negative number. Yeah. Cool. Thanks, SQL Server.

sp_DatabaseRestore Improvements

  • Nothing this time around

sp_BlitzBackups

  • #977: In the humble words of @TheUsernameSelectionSucks “this is broken”. Now it’s fixed, with the healing powers of QUOTENAME. It turns out that thing you only ever used in dynamic SQL is good for other things, too.
  • #1039: Moving files broke really easily, and didn’t take into account if you were restoring a database with a different name. Now it breaks far less easily, and takes name changes into consideration.

sp_BlitzQueryStore

  • #908: Added filtering for plan_id and query_id. Why? Because I needed it for 24HOP.
  • #1030: Same row variation item that got added to sp_BlitzCache
  • #1034: Future-proofing for SQL Server 2017. Some new columns got added with query metrics in them.

sp_AllNightLog and sp_AllNightLog_Setup

There were like 10 changes here, and I’m far too hungry to write something funny for all of them. If you’re interested, here’s the full list of issues in this release.

Installer Scripts

Two turned into three! Yes, three! There are now two installers for the core Blitz scripts. One has sp_BlitzQueryStore in it, and the other doesn’t. The installer for all scripts includes everything. I’m not making a special one that doesn’t have sp_BlitzQueryStore in it for that one.

Deprecated

sp_BlitzRS and sp_BlitzTrace are no longer under active development. Whatever version of them you have is the final version. It would take a really, really impressive code change to get us to take them out of retirement. If you want them, you can still find them here.

You can download the updated FirstResponderKit.zip here.


What is the biggest mistake you made in production?

SQL Server
76 Comments

This has been blogged before by several people. One thing I’ve learned from Brent is to not let the fact that the topic has been blogged before stop you from blogging about it. With that out of the way…

When you have sysadmin access, you are bound to make a big mistake in production at some point in your career. Not everyone has. Maybe they’re perfect, or maybe it just hasn’t happened yet. I’ve made two big mistakes in production.

Mistake #1

15 years ago. I remember it like it was yesterday. This was back in SQL Server 2000 days. We were using Log Shipping for our Disaster Recovery needs. This company failed over to the DR site 1-2 times per year and would run production there for 2-3 weeks at a time. This had two big benefits.

-Proves to the customers that the DR plan works (as long as resources are availabe in an unplanned situation)
-Allows you to do major maintenance at the primary site

During one of the planned failover maintenance windows, I was tasked with dropping Log Shipping and setting it up again in the reverse (from DR site to primary site). One of the drawbacks with Log Shipping is that after you failover to the secondary server, you can’t reverse it without dropping Log Shipping and setting it up again.

Well 15 years ago, I was a lot less experienced than I am now. My steps would be different these days. But here’s what I used to do:

  1. Drop Log Shipping
  2. Check the Log Shipping tables in the msdb database to make sure it dropped cleanly (back then there were often rows that had to be manually deleted)
  3. Drop the database at the primary site to avoid having to use WITH MOVE in the RESTORE DATABASE command
  4. Setup Log Shipping

I accidentally dropped the database at the DR site where production was now running. I immediately told my manager and the manager of the WebOps team. We were still inside our maintenance window, so I started restoring the database. I restored to the last transaction log backup that had been run, but there’d be data loss if I stopped there. I got lucky on that system that the incoming data could be recovered from text files. The bad news was that someone else had to do that task as it wasn’t a SQL Server task. I felt bad for not only having dropped the database but that I increased the work of someone else who already had a ton of stuff to complete in that maintenance window.

Now every time that I have had to drop a database in production, it takes me a good minute before I am confident that I am doing it on the right server and on the right database.

Mistake #2

I’m embarrassed to say that my second big mistake in production occurred 2 years ago. The system that this big mistake occurred on required many manual data changes due to various reasons, such as the application not having a certain feature or an application bug that hadn’t been fixed yet.

I needed to delete some data in a table. You see where this is going, right?

First I had to query the table to see what needed to be deleted so that I could get my WHERE clause right. I changed the SELECT to a DELETE. The full DELETE query was in the query window including the WHERE clause. I highlighted the query (I had other things in the query window) and hit F5.

The problem was that I accidentally missed highlighting the WHERE clause and hit F5 too fast. To top it off, I had dismissed the SSMSToolsPack warning about deleting data without a WHERE clause. I was so used to dismissing the warning that I clicked it without even thinking about what it said.

My stomach dropped. I noted the time (this is important for point-in-time recovery). I ran to my manager’s office and fessed up to what I had done. He told me to start the side-by-side restore while he worked on figuring out the impact. We had enough disk space that I could restore the database with a different name so that there wouldn’t be downtime. I restored the database and then the transaction logs to the point in time before the DELETE command had been run. I then copied the data over to the actual database. There wasn’t foreign key data that I had to worry about, luckily. It was just the data in this table that needed to be recovered.

So, fess up! What have you done?

What is the biggest mistake you made in production? Feel free to comment with non-SQL Server mistakes, just as long as they were made in production. Mistakes in non-production can be costly, but doing them in production takes it to another level.

Brent says: I’ve done a lot of dumb stuff, and I should probably write a whole compendium blog post. One of my favorites was trying to move a big 4U rack-mount server from a bottom spot to a top spot, by myself, in the middle of the night during a Saturday maintenance window. I’m no Erik Darling, and my pencil-thin arms managed to drop the server from above my head. It missed my foot by maybe an inch, shattering the raised floor tile. Took me a good half hour just to calm down. I have to hand it to IBM, too – that x445 kept right on working when I plugged it back in.


[Video] Last Season’s Performance Tuning Techniques

#SQLPass, Development, Videos
0

You’re kinda-sorta comfortable doing performance tuning on SQL Server. You’ve read a few blogs, you monitor Page Life Expectancy, you rebuild your indexes, and you add an index here or there. However, you haven’t been to a day-long performance tuning class yet, and you’re wondering what you’re missing.

Thing is, in SQL Server, performance tuning changes fast. One day Perfmon counters are in, and the next day they’re out. It’s totally okay if you’re still using SQL Server 2008 – that’s not the problem. You shouldn’t be wearing hammer pants, and you shouldn’t be looking at PLE, monitoring Disk Queue Length, or putting your data and log files on separate drives.

In this rapid-fire session, Erik and I will show you specific examples of why performance tuning tips and tricks have fallen out of fashion, and why they’re not too legit to quit:

24HOP Summit Preview Last Seasons Performance Tuning Techniques

You can get the slides and demos here.

(While there’s some awesome historical photos of Erik and I in here, PASS didn’t include the video camera recordings. That’s kind of a bummer because I had some awesome costumes.)

If you learned stuff in that session, that’s a good thing: it means that while your skills might be drifting out of date, you can still catch back up. Erik and I are doing a full-day pre-con at the Summit called Expert Performance Tuning for SQL Server 2016 & 2017. Go read more about that, check out the cool free stuff you’ll get for attending, and then register now. See you in Seattle!


[Video] Office Hours 2017/07/29 (With Transcriptions)

SQL Server, Videos
0

This week, Brent, Erik, Tara, and Richie discuss SQL Server performance monitoring, ASYNC_IO_COMPLETION, AWS SQL Server licensing and fees, rebuilding indexes, licensing model for Blitz scripts, stored procedures, using (or not using) primary key on a table, choosing ORMs and other tools, and more.

Here’s the video on YouTube:

Office Hours Webcast - 2017/07/26

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours Webcast – 2017-07-27

 

What should I monitor on 2,500 SQL Servers?

Brent Ozar: Christopher says, “2500 SQL Servers and I’m growing at about ten per week…” That’s kind of like my savings account right there, “And I’m managing [inaudible] – if I have to monitor 2500 SQL Servers, what performance statistics would you look at in order to find the really big problems that I would have to fix right now?” He says, “The schema across all these instances [inaudible], it’s just that the amount of data varies.” Wow, that’s an interesting question.

Tara Kizer: That’s a lot of servers, how big is the team?

Brent Ozar: I was going to ask the same thing, yeah.

Tara Kizer: I’ve worked at a company where there were, I think, five local DBAs and we had three offshore DBAs, you know, eight of us, and there were about 700 servers, I think. I imagine that these 2500 servers are small, maybe they’re virtual machines. Sounds like it’s the same database across each of these systems, but that’s still a lot of servers, there’s a lot of patching you have to do; maybe they’re not patching though. But I’d want to know about just the [crosstalk]. You know, CPU utilization, job failures, I mean just the normal set of monitoring stuff and just making sure that all that stuff is going to the team.

Brent Ozar: Oh man, he has a follow-up. He says, “The dev team is 30 people, there’s only one DBA.” But he says, “I’m more of a perf…” At least he says, “I’m more of a [inaudible] developer.

Tara Kizer: One DBA… [crosstalk] Don’t even have time to work on any of this stuff, you know. Yeah, you can start monitoring it, but who’s going to work on the alerts?

Brent Ozar: How do you even deploy performance monitoring across 2500 servers when you only have one full-time DBA? [crosstalk]… That’s true, oh, I don’t even know that I would want alerts of 2500 servers if I’ve only got 30 developers. That’s pretty tough. Man, what would I collect? I would collect wait time per hour, but I don’t know that I would go any deeper than that because, in terms of urgency, the amount of data you’re going to get is going to be staggeringly tough to slice and dice. Anything else that I would gather – yeah, really I would just go to wait time per hour. Then you set – whichever servers are having the worst wait time per hour, then go tackle those servers. But the amount of data that you would have to collect, even just building a database to house that across 2500 servers, is going to be a problem.

Erik Darling: Yeah, I’d probably want to get really clear priorities for which servers I need to focus on, from management. And that’s based just – and that’s probably going to just be based on, you know, how much money is coming in from certain people every month. So I’d probably just want to get, you know, three or five servers from my boss that they want me to focus on, focus on those and then see if any of the performance things can trickle down to other servers in the environment. For example, if you find some really slam-dunk missing indexes on your top three to five servers, it’s totally possible that when you go and you start looking at your other servers, those missing indexes, if they’re not usable now, will be in the future. So I’d probably want to, you know, get some priorities and then – tune my priorities and then see what would apply to the rest of the environment; because it’s a lot easier to collect data from, then analyze and then make meaningful changes on a handful of servers than it is to do it across all of them. So you want to do it and test it where it matters and then deploy across the rest where it would count.

 

Should I do DML with literals or variables?

Brent Ozar: Let’s see, J.H. says, “Which DML statement would be faster? If I do something with parameter values, or do I do it hard-coded as batches like strings?”

Tara Kizer: I don’t understand the question. What is meant by hard-coded versus parameter values? I don’t…

Brent Ozar: I bet you he’s doing stings of, like, insert into or update into and he’s just doing a list of values or if he’s doing parameters. You can run into parameter sniffing. I’m thinking specifically of update statements. You could run into parameter sniffing hitting update statements, but if you want to paste in a couple of examples, then we may be able to tell you more details.

 

Are multi-database joins slower than single-database joins?

Brent Ozar: Let’s see here, [Anupinder] asks, “For performance of queries that do joins across multiple databases on the same server, are they any slower or is there anything extra that I have to look at?”

Tara Kizer: I mean, if it’s on the same server, it’s just the normal stuff. Just look at the execution plan and see what’s going on there. What is the query waiting on also?

Brent Ozar: Is it any worse than – if I say four tables across four databases, is it any worse than four tables in the same database?

Tara Kizer: I haven’t seen any performance difference between that, but maybe I could be wrong, I don’t know. [crosstalk]

Erik Darling: Usual stuff like volume of data, if my index is lined up for the query that I’m running, all that stuff. You know, look for blocking across databases. That might be a little bit more challenging to not go, because imagine if a join is blocked in one database but not in all the other ones. What’s SQL going to get up to?

 

Is it bad to run linked server queries to localhost?

Brent Ozar: Steve comes back with a clarification, and I’m so sorry to report this answer. He says he’s “Doing a linked server from self to self.”

Tara Kizer: Sucks to be you.

Brent Ozar: Oh dude. Alright, so the problems, and I’m probably going to miss some of the problems, but I’m going to say one and [inaudible] guys know any others. Here’s the big ugly part, whenever you’re doing linked server queries out, those results are not going to be cached; not even just the results, the log data pages aren’t going to be cached. Every time, SQL Server is going to be like, whoa, I got to go do linked server query, who knows what happens over on the other side?

SQL Server can even make dumb decisions about predicate pushdown. It may bring back table results across the linked server and then do the joins to the tables. So if at any possibility, I would go try to run them without the double linked server and see what happens, see if you get a performance difference there. I’ve seen a couple of cases where it was staggering. Like SQL Server was dumping a bunch of stuff into tempdb because it didn’t have enough RAM to cache all these tables that it was bringing back from the linked server. Any other things that I would worry about there? Distributed transactions would suck. Anything else that I would worry about?

Erik Darling: I’m just sill so puzzled by the concept of it and how this is the vendor’s big idea for high availability. Like, performance hasn’t even hit my mind yet, I’m just…

Tara Kizer: I know, what article did they read that they thought that it was a good idea and it would solve [crosstalk]

Erik Darling: It’s like, no, no, no we’ll query the server from itself to itself using a link. That’s like brushing your teeth with your weak hand, like what?

Brent Ozar: The one that I saw once that – a vendor was telling me about this, he was like, “Look, we just have one linked server for production, we have another linked server for dev and another for QA, but sometimes they’re on the same server, sometimes they’re on different database servers…” I’m like, no, no, oh god no, no.

 

What does ASYNC_IO_COMPLETION mean?

Brent Ozar: Let’s see, Graham says, “We have an instance that’s [inaudible] thousand batch requests per second.” He says, “My expensive queries are always updates and inserts but my biggest wait type is async I/O completion. What does Async I/O completion mean?”

Erik Darling: That’s tied up into backups a lot.

Tara Kizer: Backups and [crosstalk]. It keeps coming up on client’s systems, and I’ve been saying, you know, this is the full backups or possibly differential backups. It doesn’t apply to the transaction log backups. But I suspect – I mean it’s also at checkpoints and I think that the checkpoints are notable on these client systems. So in the past, I’ve said it’s  probably your backups, your backup throughput is not the greatest, but when their backup is fine, I think they are experiencing an I/O issue, even though the other I/O wait stats aren’t as significant. But I think it’s the checkpoints, you know, writing the dirty pages to disk.

Erik Darling: I’ve never seen those tied in together. That’d be interesting if it did. Just for async I/O completion, it’s always just like from the start of a backup to the end of a backup, you’re accruing that wait.

Tara Kizer: Right, so the number of average waiting time on, it doesn’t reflect the backup job time. That’s why I’ve been confused on it, so I looked up Paul Randall’s async I/O completion article and he mentions checkpoints in there. So yeah, I wish that these wait stats wouldn’t have multiple meanings, you know. Separate them out, separate checkpoints out from backups.

 

Does AWS have extra licensing fees for SQL Server?

Brent Ozar: J.H. says, “Does AWS SQL Server have any extra fees, like the amount of data that gets queried, high availability clustering, or are all these fees upfront with licensing?” So licensing is one part of it, you can either bring your own licensing or you can pay Amazon by the hour, or longer term if you do reserved instances. There are the normal Amazon, and all cloud providers, have egress fees for networking, like how much data you pull out of the server. Generally, in terms of queries, that’s not a big deal. But if you do backups every day and you pull those backups out of SQL Server and put them somewhere else like on-premises or in another cloud, you can run into egress charges if you do enough of that.

 

Do stats updates help my update queries?

Brent Ozar: Tim says, “I’ve got an ERP update that times out updating a table. Whenever I rebuild my indexes, all of a sudden these updates no longer time out. Could this be an issue with statistics?”

Tara Kizer: Statistics or just, you know, putting those indexes back, you might get a different execution plan. So it could just be a bad plan cache too. Do people still – when loading data, is it common to still drop all the indexes, load the data in and then create the indexes again?

Brent Ozar: You know, I’ve seen it happen in a couple…

Tara Kizer: Back in the day, that’s what we did, but I don’t know that that’s common anymore.

Erik Darling:  Whenever I’ve had to do it, rather than go through all that, I would always just have a staging table so that I could get my fancy minimal logging and all that other garbage, dump data in as fast as I could, and then put it out to my prod tables in batches so that I didn’t have to drop all my indexes and all that other stuff.

Richie Rump: Yeah, and I guess it depends on how much data you’re loading, right? I mean, if we’re loading a gigabyte of data, I may want to drop the indexes, I don’t know.

Erik Darling: So one thing that I would be curious about is prior to SQL Server 2014, SQL would have a whole lot of trouble with execution plans if there were ascending keys involved. So if your update is only, say, hitting the most recent chunk of data and you haven’t hit a statistics update threshold yet and it’s an ascended key, well you can run into some performance trouble in that regard. And then the index rebuild would update stats at the full scan, you would have that ascended key information in the histogram now, and that might do it.

 

What licensing model to the Blitz scripts use?

Brent Ozar: Tim asks, “What licensing model do the Blitz scripts use?” We use the MIT license, which kind of means that you can do anything with it that you want as long as you retain the copyright notice inside the script. Like if you go hand it out to other people or you change it, you just need to include the copyright notice, but you can charge for it, you can give it to strangers on the street, whatever. And then, “Do you retain any rights to pull back data, even aggregated, for the servers that it’s on?” Because it’s the MIT license, that doesn’t say anything about what we could do with the data, however, all the sp_Blitz scripts are just plain T-SQL that send results back to you. So they don’t do inserts over the internet or push stuff back to us in any way shape or form. We did used to, a long time ago in a galaxy far, far away; we had a Windows app that could have been able to do that. It didn’t, but just in terms of any time you run a Windows app, find out where the data’s going.

Erik Darling: You know, what’s really funny is when I first started working here, there was a conversation about the Windows app. And I Googled something and someone had, like, uploaded a PDF of the findings from the Windows app to the website and it was searchable. And so I click on it and I’m looking at it and I’m like, someone put this miserable sp_Blitz output online and it was searchable. Like you could find it by searching some of the Blitz text. I was like holy cow, someone did that. Like, I thought I was the only one who ever downloaded the app.

Brent Ozar: Oh, what I’ve learned over time is, if you give people an easy way to share stuff, they will do it. One way or another – here’s my social security number, can someone tell me how to fix my credit?

 

Would you rather use the server name or a DNS CNAME?

Brent Ozar: Brandon says, “When you guys let apps or clients connect to a SQL Server, would you rather have them connect to just the server’s name, or would you rather connect a DNS CNAME or a SQL alias and why?”

Tara Kizer: So having done some desktop support in the past and any time the server name changes, you know, it’s painful to update all those connection strings out there in the world. So I’m a big fan of using DNS aliases, and if you’re opt using some kind of fancy feature like either database mirroring or availability groups, which has the ability to point to – you know, like availability groups point to a listener name. but if you’re not using stuff like that – I like the DNS CNAMEs. I have used SQL aliases, but that has to get deployed to every single machine, it’s not just on a SQL Server. That’s every single place that has to connect. You can deploy that easily, it’s just a registry file, you can deploy it easily via some kind of login script, and I’ve done that in the past. But DNS alias, I think, is the best solution if not using something like an availability group listener.

Brent Ozar: As someone who’s had to play around with group policy before to push out SQL alias…

 

Should my SSRS reports use a stored procedure?

Brent Ozar: Dan says, “When you have an SSRS report…” And this really could be for any report, “Would you rather have the data be fetched from a stored procedure or a query built into the SSRS report?”

Tara Kizer: I prefer stored procedures because if you have to make a change, you could just modify the stored procedure. You don’t have to go and fix the report, the application or whatever. You know, I’ve had situations where there was a little tiny bug in the code and it required a full release to get this thing to ploy to production, whereas if it’s in a stored procedure, I could just go fix that real quick. And yeah, I’m going to do change control and all that stuff, but it’s so much simpler. I mean, there’s benefits to using stored procedures besides that, but I don’t like just throwing the queries into the report.

 

Why would you NOT create a primary key on a table?

Brent Ozar: Let’s see here, Mark says, “Why would you not create a primary key on a table?” He says, “I’ve got tables with 100 million or more rows and they’ve got a unique clustered index, but they didn’t create a primary key. Why wouldn’t you create one?”

Tara Kizer: So I had a client, this is a few months ago – it’s a vendor application and there were absolutely no primary keys anywhere, and they needed to move their data to other systems, and so they could not use transactional replication because transactional replication required primary keys. So they were using – I think this client was using peer to peer replication or some other, you know, some other SQL Server replication that did not require primary keys. Now, they did have the unique indexes and unique clustered indexes, but I mean, I like constraints because they’re meaningful and they tell you about the data, whereas a unique index, yeah this portion of the data is unique, but that doesn’t mean that that is the actual constraining uniqueness of that row.

Richie Rump: I mean, I think the only time I wouldn’t put a P key on there is like if it’s a loading table or something like that. Even if it’s a sorted key, I would most likely put it on there. 99 times out of 100, 999 times out of 1000 the primary key is going to go on that production table, unless it’s some sort of loading table, which is probably in a different database, as it were.

Brent Ozar: I’ve heard people say, like, I just don’t want to enforce referential integrity in the database, I believe it’s going to slow me down. So I’ve heard people say that; I don’t agree with it, but I’ve just heard people say it.

Tara Kizer: And it’s always developers that say it.

Richie Rump: There’s this thing called “indexing,” people, that you could put onto those keys to kind of speed things up a little bit. I’ve heard – it’s a thing now, I guess, I don’t know.

 

Is there a best ORM out there?

Brent Ozar: So speaking of things now, Nick asks, “Is there a best ORM out there or are they all pretty much the same?”

Tara Kizer: I don’t care what tool people use. I’m just – I’m coming from the perspective fighting fires in production. I just want things to perform well and not wake me up at three in the morning so that I can do proactive administration and not have to constantly be working on production problems. A lot of the time, production problems are performance based and I’ve had a lot of bad experiences with performance problems that were due to ORM. So I do have a blog post out there that’s got a bunch of activity on it. I don’t remember what the title is but it basically is why I don’t like ORMs. It’s not really about the ORM, it’s about the performance problems that they cause. I don’t care what tools you use, just don’t cause performance problems.

Richie Rump: And I’m the opposite side of that, I kind of like ORMs, I kind of like them a lot. So I don’t think there’s really a best one out there, it’s just a tool that you and your team feels most comfortable with. I’ve been on teams that have just gone straight to stored procedures and it’s thousands of lines of code written by developers, and you can see how fast that’s probably going to run. And we go to an ORM like Entity Framework and things get a lot faster. Now, you’ll run into a lot of issues because the team or the individual doesn’t understand what the ORM does on the backend and how it creates the queries and doesn’t even look at the queries on the backend.

So when you get to more complex stuff, there’s not a lot of thought there. It’s like, oh it works in dev, then you throw it into production with all the production data and it just starts slowing everything down. The best thing you can do for any tool that you have is understand that tool. And I know – understand that it’s hard for us developers because we have lots and lots of tools that do lots of different things, but if you don’t have that understanding, an ORM is going to be bad for you because you’re going to put it in situations that are not the best situations for the tool, for the ORM.

Erik Darling: When it comes to stuff like that, where I don’t have a vast or deep knowledge of what each and every tool is best or worst at, I would just want to ask two questions; what are most people doing and what are most people that are doing what you’re doing using? So that’s how I would at least narrow product choice down. I mean, Richie, can you think of an ORM outside of Entity Framework that’s really competitive with Entity Framework?

Richie Rump: Yeah, Hibernate. And Hibernate has been around forever…

Erik Darling: Okay, so what else? Like if you had to name a top three or five…

Richie Rump: So now you go into micro ORMs like Dapper and… There’s a ton of them. So what happens is that people, you know, like their developers say Entity Framework sucks, so I’m going to go create my own flavor of Entity Framework which is just going to do a small little thing of – it doesn’t do everything entity framework does, but it just does this little thing and it does it really super well. And that’s how you get all these little ORMs going on all over the place; everyone has their own little take on it and now all of this stuff is blowing up and everybody had their own simple version. And yet, mine’s the best, mine’s the best, mine’s the best, and therefore you get these developer wars going on, which I particularly hate; there’s no need.

Brent Ozar: It’s good Richie doesn’t like conflict.

Richie Rump: Yeah, me – I’m here with my hand on the detonate button right now, you know…

Erik Darling: He doesn’t like conflict or developers, so it’s just a weird pacifist line he’s drawn in the sand…

Richie Rump: I want to kill myself, essentially, at any given time, yes.

 

Can I prevent developers from using app logins?

Brent Ozar: J.H. Asks a really interesting question. “Do you have any suggestions on how to prevent developers from connecting in with the app’s logins accounts into production servers?”

Tara Kizer: I’ve struggled with that and for me the answer is why do the developers have that user ID and password? Isn’t there another team that has the production access – developers, at least in my environments, developers did not get to see the connection strings in production. They could see it in other environments. At my last job, or a couple jobs ago, they were talking about having some kind of encrypted application that we would control that we would be able to put whatever password we had selected when we created the account and then it would be encrypted. So even if developers had access to be able to look at the web config or whatever file it was, they couldn’t read the value and use it.

Richie Rump: Yeah, and there’s a setting, if you’re using c# in Windows config or app.config or whatever, if you’re using .NET, you can encrypt those encryption strings so nobody can see them. And it’s actually really easy to do inside the application to encrypt it.

 

What’s the biggest problem you faced when becoming consultants?

Brent Ozar: Now, last question that we’ll take, and this one’s kind of tricky. This is from Graham, and Graham, we may not be the right people to answer it, but we’ll give it a shot. “What’s the biggest issue or problem that you all faced when becoming consultants? Do you have any tips on how to get plugged into the SQL Server consulting community? I know blogging helps, but what else?” Richie, you went off on your own and did the consulting thing for a while; what kind of things did you see people going off and doing to get into the consulting business?”

Richie Rump: I saw a lot of people struggling, like myself. It’s a grind, right. I mean it’s a hard thing to do. So much so that I went and got a full-time job because there’s a lot of it that I just didn’t like. So the marketing and – it’s a constant getting yourself out there, going to different conferences and meeting different people, you know, write people, getting people to say yes. There’s a lot of soft skills involved within that. The stuff that Brent does really, really well, I don’t do well at all. So hey, now you know why I’m working for him; that’s just kind of the way that works.

Brent Ozar: Yeah, the hardest part by far is getting the business, getting enough business to keep yourself busy. And it’s this weird cycle where if you don’t do enough of the marketing then you don’t bring enough business in. so people will go off and they’ll get a 40 hour a week contract, it’ll last for three months, but when the contract’s over, there’s no business coming in if you weren’t building up that funnel leading up to it. So you nailed it, Graham, with blogging. Anything, blogging, presenting, writing, doing guest posts for vendors, webcasts for vendors, you’ve got to be out continuously, beating the drum going here’s what I’m good at, when you need help with this, call me. If you don’t love doing that and if you can’t do it at least eight to 16 hours a week, go work for a consulting company. It’s so easy to go to join a consulting company. Say like ours, you’ve got me, who does some of the marketing stuff to bring business in and some of the sales stuff in, but then you don’t have the pressure of how do I bring business in. It is really, really terrifying to try to [inaudible]

Richie Rump: yeah, I would even say it’s more than just the 16 hours, right? I mean I was pouring – it was almost like a second job to me, you know. Coming home, it’s three o’clock on the morning I’m still cranking out podcasts and practicing presentations and trying to avoid writing. That was kind of the norm for three and a half years. Unlike Brent, my 40 hour a week client didn’t last three months, it actually lasted three years. So that kind of helped prolong it a bit, but I still had real trouble getting people into that funnel.

Erik Darling: I was going to say, beyond marketing, just establishing a reputation has to be tough, like a trustworthy brand and reputation has to be tough. You could really establish any kind of reputation you want, but having it be a good one to bring in business is completely different.

Richie Rump: Even if you have a bad reputation [crosstalk]

Brent Ozar: Alright, well thanks everybody for hanging out with us this week. Adios, everybody.


Live Blogging: Erik vs. PowerShell

Humor
42 Comments

Oh, hey, happy Saturday

First, I’d like to apologize to Conor Cunningham for the blatant theft of a blog title.

That’s what you get for not posting in three years (like the desert miss the rain~).

Why am I here? Why am I writing PowerShell, my sworn enemy?

I do it for you. I’m trying to be a good open source maintainer, so you don’t end up with a useless (hmpf! useless!) script on your servers. Instead of saying “go ahead and drop it if you don’t want it”, I’m going to properly address the issue.

Here’s where I’ve been so far:

Keeping it Family Friendly®

Well, alright, so at least I’m narrowing things down.

Brent Cares©

He’s going to spend $10,000 on brunch for Ernie after this.

Adventures In PowerHELL

But hey, you know, PowerShell has a Copy-Item command.

Off to a good start!

According to the laws of “what’s the most obtuse way we can give people to clobber files together”, we have to use not one but TWO commands!

We have to Get-Content and pipeline (|) it to Set-Content.

Sounds reasonable, let’s get going.

Calgon, take me away

Where will we go next? Follow along!

Alright, let’s try a different command. Maybe the administrator thing is silly. Maybe I’m stepping in the wrong direction.

What… What doesn’t exist?

It existed a minute ago when you didn’t have permission, dummy.

Maybe I’ll try using -LiteralPath, because that’s an option. I like options. If I could drive, my Citroen would have 42 ski racks.

No, that doesn’t exist either. Riiiiiight.
Psychic friend

WE ARE NOW RUNNING AS ADMINISTRATOR, EVERYONE STEP BACK

Do you believe?

Access is still denied running in Admin mode. That’s nice. Time to stick in some humorous dialog while I try to figure out something else to do.

White Whine

Breaking through!

Gotta call me on the yacht

ACCESS NOT DENIED!

SMH

DEAR GOD WE HAVE FILTERING

It’s like progress, but less rewarding

We’re having fun, really.

I’d Rather Be In Jersey

Alright, now we have to try to exclude sp_BlitzQueryStore!

So uh, about that.

What on earth

Holy cow, it took 15 minutes of tinkering to get this working.

Apparently if you want to do what I’m trying to do, you can’t use (???) -Filter. Maybe? Something like that? You need to use -Include and -Exclude. And you need to… surround your path in quotes. And use a wildcard at the end? I don’t even understand how I ended up here, honestly, I just kept changing things until something worked.

[Sometime around 10:30]

INTUITIVE AND USER-FRIENDLY THEY SAID

LEARN IT IN AN AFTERNOON THEY SAID

THERE GOES YOUR SATURDAY MORNING, THEY NEVER SAID

Finally this is what worked.

I’m so alone.

So an hour and a half later, I have a list of files. I still have to get their content and set their content. This is fun. Learning. Hey, does anyone have a book on Replication?

Falling flat

Alright, so knowing just enough to be dangerous, after a pipeline I can pass in [something] from the previous command. So, I tried $.Name, and $.Path and $.FullPath. They all threw about the same error.

[Sometime around 10:45]

My wife is now standing in the office doorway asking me when I’ll be done. Good question.

At first glance, never.

Oh… Select-Object.

HNNNGGGGG

Shout out to Cody Konior for being the only person to blog about this using -Property *.

Alright, so now that I can get the right path, all I need to do is figure out how to pass that to Get-Content. Fingers crossed. Here we go.

Nope.

Oh. I need a ForEach loop. Okay.

[Sometime around 11, I think]

If anyone’s wondering, Brent left 45 minutes ago to to try to smother himself with a pillow.

But success! We’re now able to print the contents of every file out to the console. That must mean we’re close to being able to dump them to a single file.

HOLY MOTHER OF GOD IT WORKED

Wrap up

If you’re out there wondering why people hate PowerShell, here’s a good example of why. There was a whole lot of nonsense just to get some files copied into a single file. None of it was intuitive, and it took a lot of searching and tinkering to get things right. I can’t imagine what it’s like when you have to do something more complicated.

Now I have to go watch the Emoji movie sober and my wife is mad at me.

Thanks, PowerShell.


SQL Server 2017: Less xp_cmdshell?

Pokin’

SQL Server 2017 RC1 dropping recently reminded me of a couple things I wanted to blog about finding in there. One that I thought was rather interesting is a new iTVF called dm_os_enumerate_filesystem. It looks like a partial replacement for xp_cmdshell in that, well, you can do the equivalent of running a dir command, with some filtering.

The addition of a search filter is particularly nice, since the dir command isn’t exactly robust in that area. If you’ve ever wanted to filter on a date, well… That’s probably when PowerShell got invented.

If I run a simple call to the new function like so…

I get this back:

This can be particularly useful for backup tasks, or folder watching tasks. For instance, I can look in a backup directory for recent backups with a size greater than 0 bytes that haven’t been written to in the last 30 seconds (a pretty good sign that a backup is completed, no?)

And that’s a heck of a neck easier than what we have to do, currently.

Thanks for reading!


ColumnStore Indexes And Recursive CTEs

Alone Together

When I think about SQL Server features, I often picture a high school cafeteria.

AGs are making fun of Mirroring, index rebuilds are walking around taking everyone’s lunch, dta is making a glue sandwich, and no one knows who Service Broker is even though they’ve been going to school together since Kindergarten.

At the artsy table are two oft-misunderstood and unpopular features: ColumnStore indexes and Recursive CTEs. The oldest person on Stack Overflow, Joe Obbish, recently blogged about a deficiency in ColumnStore indexes when it came to doing string aggregation. When someone says “X is bad at Y”, my first reaction is generally to try to find other things X is bad at, and kicking X while it’s down.

Lace Up Your Boots

I know what you’re thinking: WHY WOULD YOU EVER DO THIS?

Well, ladies and gentlemen, that’s what consultants specialize in.

Stuff no one else would ever do.

There’s a really interesting relationship in the Stack Overflow database both within the Posts table, and connecting the User and Comment tables. You can build up pretty interesting hierarchies with them.

The Posts table has both questions and answers in it. You can tell them apart using the PostTypeId column, and you can also look at the ParentId column. If a post is an answer to a question, it’ll have the Id of the question populated.

Likewise, the Comments table connects comments to posts and answers, and both tables track the Ids of users. This comes in Handy for grabbing DisplayName from the Users table.

Having spent an amount of time only a consultant can looking at stuff in the database, I know that post 184618 currently (2016-03 data dump) has the most answers in the Posts table, at 518. So that’ll be our performance bar.

Gangland

Let’s start by indexing and querying rowstore tables.

Here are our awesome nonclustered indexes.

And here’s our extra fun recursive query.

With my indexes in place, here are my stats Time and IO results. I’ve lopped off all the zero-read operations. for readability.

I even have what most would consider a reasonable execution plan for such a query.

Methinks the lady doth compute too many scalars.

Inflection Point

With that in mind, let’s hop on over to my ColumnStore version of the Stack Overflow data dump. All of the tables here either have a clustered ColumnStore index on them, or a nonclustered one if they have columns with datatypes that make that impossible (Users and Posts both have columns that are MAX types, which makes CS puke).

Running the same query, here are the results of stats Time and IO. What you’re looking at are the results of the second run with a warm cache. The first run took about 4 minutes and did a bunch of physical reads, and that didn’t seem fair.

If you’re wondering what those milliseconds are in minutes, it’s 3:17. So uh, there’s our first problem.

And here’s the query plan, which bears some interpretive dance. It looks a whole heck of a lot like the previous plan. Prior plan? One of those. Except two index spools are introduced.

We’re a lot alike, you and I

Funny things you should know about Index Spools: They’re single threaded, and I’m going to go out on a limb and say SQL doesn’t build ColumnStore Index Spools. They cache data in tempdb. When they’re Eager, they cache all the rows coming to them, and when they’re Lazy they only cache rows as required. This makes Lazy Spools much more benign than Eager Spools (there’s a Lazy Spool in both of the query plans).

They also make up the majority of the work our query is doing here in ColumnStore land. There are other obvious problems with it, like it running in Row mode rather than Batch mode. I stuck the query plan in Paste The Plan for anyone interested in poking around more.

I’m not getting any prettier, here

Stuff like this is part of the reason why we’ve started warning about Index Spools in sp_BlitzCache and sp_BlitzQueryStore. We started warning about ColumnStore indexes operating in Row Mode a while back because it’s basically a death knell for performance.

This post should have a point, right? ColumnStore! Not good at everything. Pretty spiffy for aggregations, but there are plenty of times when traditional row store indexes will out-perform them.

For more information about Spools, check out these links:

Fabiano Amorim: Part 1 and Part 2

Paul White: Part 1, Part 2, Part 3

Rob Farley: Part 1

A most excellent dba.stackexchange.com question, in which Joe Obbish gets prescribed some Spool Softener

Thanks for reading!


Other People’s Blog Posts I Talk About the Most

Development
5 Comments

In my work with clients and classes, some blog posts come up a LOT.

Forcing a Parallel Query Execution Plan by Paul White – Paul’s posts usually cover execution plan components in incredible detail, but the real gem in this one is the section called “Parallelism-Inhibiting Components.” If your T-SQL includes this stuff, the whole plan or a zone of it will go single-threaded.

Use Caution with SQL Server’s MERGE Statement by Aaron Bertrand – Whenever someone’s question starts with, “I’m using MERGE to…”, I immediately open this page and show them the list of known problems with MERGE. Their response is, “Oh my God, you mean these are really known bugs? What should I use instead of MERGE?” We have a discussion about rolling your own upsert. MERGE needs a 12-step recovery program.

Take Care When Scripting Batches by Michael J. Swart – When you need to affect a lot of rows – like a big delete or update – and you need to do it repeatedly over time, batching can help. If you really want the most performance, you need to run scientific experiments like Swart does in his post:

Logical reads per delete – read Swart’s post for what it means

Serverless Architectures by Mike Roberts – Also known as Function-as-a-Service (FaaS), serverless is a new way of application development and deployment. This post is a monster, but just read the first section, “What is Serverless?” and stop at the “Benefits” section. Knowing just that part will help you have better conversations with architects and developers.

Juggling Writing AND a Job? Figure It the %$&* Out by Chad Gervich – This one’s professional development, not tech, but it comes up a lot. It’s for people with day jobs who want to become screenwriters, but the exact same advice holds true for aspiring bloggers. It doesn’t matter if you want to build something – we all wanna build stuff. You also have to have the spare time, or make it by making sacrifices in other parts of your life.

Slow in the Application, Fast in SSMS by Erland Sommarskog – Listeners of our Office Hours podcast probably feel like this one comes up every single week.

Maker’s Schedule, Manager’s Schedule – if you want an employee to finish hard tasks, block out their calendar so they don’t have to context switch.


Let’s Corrupt a Database Together, Part 3: Detecting Corruption

So far in this series, I’ve shown you how to corrupt a clustered index, then how nonclustered indexes can be corrupted independently. If you haven’t read those, you should start there first.

Let’s start again with our 50Ways database, but this time we’re going to be good and make sure that we’ve got checksums enabled, and that we’re in full recovery model:

Then fire open your trusty hex editor, like xvi32, and open up the MDF file just like we did in Part 1. Change Stan’s name to Flan, save the MDF, close your hex editor, and bring the database back online again:

Thanks to the magic of checksums, SQL Server knows what you did last summer:

So the table is toast, right?

Try inserting more rows.

Let’s add another four rows:

And if the moon is right, it works just fine.

4 rows affected

Adding more data doesn’t necessarily detect corruption. But if you try to select them back out, you’re still going to have a problem.

This is the worst part about database corruption – SQL Server will let your users keep right on databasin’, adding more data into a corrupt database. You have a serious time bomb on your hands here: the longer you let this go on, the more dangerous things get. Watch this.

Take a full and a transaction log backup of your database again:

And they work fine – no errors. And guess what happens when you try restores? You’ll need to change M:\MSSQL\DATA\ to your own data/log paths here (I try to keep scripts super-generic so they work everywhere, but restore ain’t so flexible):

No errors there either.

By default, backups and restores don’t detect corruption.

So now think through the timeline of what we just did:

  1. We created a database
  2. We put some data in it
  3. We took a full backup #1
  4. We took a log backup #1
  5. We corrupted the data
  6. We took full backup #2 (which backed up a corrupt data page)
  7. We took log backup #2

If we need to recover from the corruption that just happened (and assuming we can’t do page-level restores, which is for another blog post), the proper sequence is:

  1. Take a tail-of-the-log backup, which gets the last of our transactions and seals the database as read-only
  2. Restore full backup #1 (from before the corruption happened)
  3. Restore log backup #1
  4. (Skip full backup #2 altogether, because its data page is corrupt)
  5. Restore log backup #2
  6. Restore the tail-of-the-log backup

But this only works if you actually have log backup #1. This is kinda horrifying because I bet you:

  • Do CHECKDB once a week, like on Saturdays
  • Do full backups every day
  • Do transaction log backups multiple times per day, like hourly
  • But due to limited drive space, you delete log backups older than 24-48 hours

So take this timeline:

  • Saturday – DBCC CHECKDB runs, reports success
  • Sunday – full database backup, and logs all day long
  • Monday – full database backup, and logs all day long,
    and we delete Sunday’s log backups
  • Tuesday – full database backup, and logs all day long,
    and we delete Monday’s log backups
  • Wednesday – we find database corruption.

Depending on when the corruption happened, you may not be able to restore without experiencing serious data loss.

Thankfully, it’s easy to detect corruption.

Once you’re aware of this problem, you have a few great options.

You could keep your log backup files around longer. In theory, you keep them all the way back to your last clean CHECKDB. In practice, you’ll need to keep them longer than that. If you do CHECKDB every 7 days, and you delete log files older than 7 days, then when CHECKDB fails, a human being probably won’t disable the log-deletion job fast enough to keep the log backups online. In that scenario, 10-14 days of log backups might be a better choice – especially if there’s only one DBA, and everyone else just leaves the alert emails for the DBA to handle when they get back from vacation.

You could run CHECKDB more often, or on a restored copy of production. It kills me when I see people doing index rebuilds every night, but CHECKDB only once a week. Your priorities are backwards. DBAs get fired for lost data, not for slow performance. (Are you kidding me? You actually get extra training and tools budgets when your server is slow. Come to think of it, you should probably go hit the turbo button on the SQL Server just to make it slow down for a while. The turbo button doesn’t exist anymore? Maybe change your power plan down to Fair and Balanced instead of High Performance.)

Configure your alerts and failsafe operator. By default, SQL Server keeps its secrets and doesn’t tell you when it detects a corrupt page. It’s crazy easy to set up alerts – check out our SQL Server Setup Guide in the First Responder Kit.

Use the WITH CHECKSUM command on backups. When specified, this makes SQL Server check the checksum on pages while backing them up. It’s not as good as a full CHECKDB, but in the case of our toy database, it works like a charm:

You could implement mirroring or Always On Availability Groups. By themselves, these don’t make it easier to detect corruption, but they make it easier to deal with the aftermath. Sure, they have Automatic Page Repair, but they also have something simpler: a separate copy of your database’s data files. Instead of your data files being a single point of failure, now they’ve got redundancy from storage failures (assuming of course that they’re on separate physical storage, not on the same storage device.)


[Video] Office Hours 2017/07/19 (With Transcriptions)

SQL Server, Videos
0

This week, Erik and Richie discuss performance tuning, relativity, Always On Availability Groups, parameter sniffing, technical debt, unit testing, deadlocks, testing storage performance, and more…

Here’s the video on YouTube:

Office Hours Webcast - 2017/07/19

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours Webcast – 2017-07-19

 

Erik Darling: James says, “Thank you for your DBA fundamentals videos.” You’re welcome, James. It’s a pleasure recording them. I have to re-record one on setting up Ola Hallengren’s scripts this week, so there’ll be another one coming down the pipe in the near future.

 

Can I trust the index usage DMVs?

Erik Darling: [Tashea] wants to know, “Can I trust sys.dmdb_index_usage_stats to drop indexes which are not used? What else resets its readings other than a SQL restart?” Well, there are a whole bunch of wonky things that happened in certain versions of SQL Server 2012 and 2014 where the behavior is different between service packs and CUs. There are certain ones, like I think almost all of 2012 until SP3 and probably 2014 up until SP1, where if you rebuild an index then that will clear out all the usage stats. And up until now, on many versions and editions, if you add an index or drop an index, it will also reset some things. So be very careful with that one, especially if because even if your system has been up for a long time, you may have rebuilt indexes, you may have added or dropped indexes, and there also might be really important indexes sitting around that just don’t get used frequently.

So one thing that I always caution people about is, you know, always remember to disable your indexes, don’t drop them. Because if some query comes along that runs, you know, for the super important guy who sits in an office that runs every three months or six months at some quarterly or whatever financial report or something, and there’s a very important index for the query that you just disables because it had one use in the last three months; well, you could be in trouble. So make sure you don’t drop indexes, make sure that any indexes you choose you just disable them.

One really good way to trend that kind of usage is by using sp_BlitzIndex. It’s a free script that we write. It’s available over at FirstResponderKit.org. If you head over there, you can download it for free and run it for free on as many servers as you want, for free. And it will give you the usage stats and you can copy out to an Excel file and kind of keep track of usage over time. You can also persist it to a table, if you’re that kind of person… I don’t know, maybe you are – maybe you’re not. Anyway, I hope that answers your question.

Richie Rump: Yeah, we call those people hoarders. Or Hordor, I don’t recall…

Erik Darling: I don’t know, Mordor hoarders, The Hoarders of Mordor? That would be a good show, The Hoarders of Mordor. Just Orcs with like a pile of bones, like this is mine…

Richie Rump: Just this random weapon and armor, just throw it all over the place…

Erik Darling: It’s like my office, it’s just garbage everywhere.

Richie Rump: Yeah, it’s just body parts, you know, maybe they categorize them, you know, and they’re just in different places.

Erik Darling: Yeah right, it’d be weird.

 

Why is CXPACKET my top wait percentage?

Erik Darling: James asks, or says, rather – says then asks, “My prod server has 12CPUs and I have my MAXDOP set to eight and cost threshold set to 65, however, I still see CXPACKET has 49% of the overall wait stats on my server; why?” James, you should know better than to tune wait stats by percentage. You’ve been to Brent’s classes, you know these things. You don’t tune things by what percentage they are. You should revisit some of Brent’s lessons on that. For the benefit of the group though, it is absolutely natural to have parallelism on your server, unless you’re running SharePoint or something else that forces you to set MAXDOP to one.

Trying to stamp out all evidence or all percentages of parallelism is not a terribly good idea. You still have parallel queries on your server because you still have queries where the serial plan has a cost of greater than 65, and those queries go parallel because the parallel plan was cheaper. That’s why you still have CXPACKET waits. Who cares if it’s 49% of the waits on your server? It all depends on what the total waits are and other fun things like that.

 

Should I use RCSI for kCura Relativity?

Erik Darling: Let’s see here, Jack asks, “Do you have clients successfully using RCSI with Relativity or Tunnel Vision?” Well I’ll be dead honest with you, I’ve never heard of Tunnel Vision. For Relativity though, no, and the reason for that is because every single Relativity query, behind the scenes, is generated with NOLOCK hints. And when you have query level hints like that, they override database level settings like snapshot or recommitted snapshot isolation. So it wouldn’t do any good to turn those on and overwhelm tempdb with all the modifications that come in only to have all your read queries use NOLOCK anyway. You would really need to do a lot of app redesign to benefit from having NOLOCK on there.

A long time ago, when I was a DBA at a Discovery shop, I opened up a [table] with Relativity and I was like, “Hey, how come you guys don’t use RCSI instead of just NOLOCK hints?” And at the time when the app was designed, and it’s still true to this day, many shops that run Relativity do not have a DBA or a DBA team or anyone really paying close or good attention to those servers in any way. If you just – if you install SQL Server and click next a whole bunch of times on 202, 2008, 2008 R2, all the older versions that were around when Relativity was first getting started, you didn’t have all those cool options and features kick in that you do with 2016 and up. So you wouldn’t have the installer going and creating multiple tempdb data files for you, you didn’t have anyone adhering to server best practices by not putting tempdb on the C drive. Just, you know, regular DBA stuff.

So, just the amount of work that has to go into supporting something like an optimistic isolation level is not trivial to a shop that does not have the correct staff for it. So, no is the answer and that’s the reason why; fun stuff, right? Fun stuff right, Richie?

Richie Rump: If you say so, sir…

Erik Darling: Are you enthralled yet? Are you riveted?

Richie Rump: I actually – someone put a link out for the collector’s edition of Star Trek Adventures the RPG and it’s like $500. It’s amazing but it’s like $500, and no…

Erik Darling: Well, I don’t know [crosstalk]

Richie Rump: Can I expense it?

Erik Darling: If it’s office equipment, I don’t see why not. I mean, what’s left on your hardware budget? You might as well…

Richie Rump: Not much…

Erik Darling: I got like 500 bucks left, I’m thinking about getting one of those desk-risers so that  I can have like a fake stand desk, but I just don’t know how that’s going to work with the wires. I’m kind of terrified of it. I’m like oh if I hit the button everything’s just going to tip over and unplug and it’s going to be…

Richie Rump: Well what you could do is, you could get a mount for your Hoss and mount it underneath the tabletop.

Erik Darling: For my what?

Richie Rump: For your Hossmachine, and like mount the machine underneath the desk, and that way you could raise it and lower it.

Erik Darling: I think I’m just going to get real strong and lift the desk up when I’m ready to stand.

Richie Rump: That’s too close to a table flip, man. You’re just going to be like… I just finished a job.

 

If a sync AG replica goes down, does the AG switch to async or go down?

Erik Darling: Steven asks a question about availability groups, which I will answer to the worst of my ability. “If Always On sync goes down, does it automatically change to async and just start filling the log, or does it go down?” There are a lot of weird things that happen when an availability group goes down that are different between 2012, 2014 and 2016. So in 2012, it’s like if one goes down, the whole thing goes down if quorum gets messed up.

On other versions, it could stay online, but either way, I think what you’re asking is if it stays online and it stays in good shape. But the answer is kind of no; you want to get things up and running as soon as possible. You also want to make sure that log backups, if it does failover – if it does failover, you want to make sure you get log backups going on the new primary. If the primary just goes down and you’re working off replicas, well you should work on getting that primary back up or making one of the replicas the new primary so that your AG is complete. That’s all I have to say about that. If Tara were here, she’d probably have a better answer. That’s my patchwork answer.

Richie Rump: Definitely more complete, sir.

Erik Darling: I don’t know what that means.

Richie Rump: I don’t know either, but it’s definitely better than my answer.

 

What is parameter sniffing?

Erik Darling: “What is parameter sniffing and how does it work?” God, well that’s the funny thing, is that it doesn’t if it’s parameter sniffing.

Richie Rump: No, it works then it breaks, right?

Erik Darling: Parameter sniffing is when you have some code that accepts parameters. And on first compile, that code executes and compiles with those parameters. And then on subsequent executions, that code repeats as if it’s received the same parameters, even if the parameters are different. So, that is parameter sniffing and that’s how it works. If you have a stored procedure, and let’s say it takes a date – let’s say it takes a start date and you say – let’s just say it takes a date for an equality. And you run your stored procedure and it brings back everything in the table that is equal to that date, and that date was 2017/7/19, right; so that’s today.

And then the next time that stored procedure runs, it runs and it runs as if – with that date you pass in as 2017/7/18, so that was yesterday, and all of a sudden you have to bring back a lot more rows because you had a lot more time to insert data yesterday. Let’s say you had a million rows on the 18th and you only have about 20 rows on the 19th. So that stored procedure that compiled and ran expecting to get 20 values back, all of a sudden, has to go and pull a million values back. And when it pulls that million values back, the execution plan that it uses to do that may be inadequate because a million is a lot more than 20. So there may be a case where a different execution plan would have been more appropriate for a different value. Say for a small amount of rows, you do a key lookup, you use a non-clustered index, you do a key lookup, you get some columns back from the clustered index. For the million rows, it just may make more sense to scan the whole clustered index and get the rows back that way. So it’s all out there in fun land – because it happens every single week because we talk about it every single week, I’m going to stick the link up there for it, Erland Sommarskog. So the man himself, or his self, depending on how you like to talk, has a blog post called “Slow in the Application, Fast in SSMS.” And what it does is attempt to explain to you parameter sniffing in biblical blog post fashion. If you are really interested and that is not where your question ends, I would read that entire blog post and see where it gets you. If you have more specific questions, come back and ask.

Richie Rump: So, if I did run into some parameter sniffing, what type of things would I do to fix it? See, this is what they call, in the business, a softball. Hit the softball.

Erik Darling: Okay, well if I was really crappy at my job, I’d restart SQL Server. If I was a little bit less crappy at my job, I would probably clear the entire plan cache. If I was slightly less crappy than that, I would probably clear that plan out of the plan cache. If I was a little bit less crappy, I might recompile, or throw a recompile hint on the stored procedure or the statement in the stored procedure that’s causing my parameter sniffing. If I was much less crappy, I might take a look at my indexes and try to figure out what a good index is to take bad choices away from the optimizer, regardless of the value that gets passed in. So, perhaps making that nonclustered index covering, or rearranging the key values in a non-clustered index to make it more aptly used, no matter what.

There are all sorts of things that you can do down the line including rewriting the query to need less columns to begin with and do all that sort of fun stuff.

Richie Rump: Yeah, what I found with parameter sniffing is sometimes the answer is a recompile, maybe it just doesn’t run frequently enough and, you know, you throw a recompile on it and it’s just not going to tax the system and you’re good to go. Sometimes it is the index, I mean, you just – when doing tuning, I’ve always gone, what’s my output goal? Right, you could get lost in tuning land and just get – oh I need to get it further down, further down, further down, but what’s your exit criteria? So you say, hey, the exit criteria, for me, is that it’s got to run another second and it doesn’t raise these metrics up this much. And when you hit that, you leave and you go onto the next problem.

Erik Darling: That sounds like a good plan to me.

 

Can I set isolation level in my connection string?

Erik Darling: So there is an Entity Framework question that I think you should take while I get a link for Kevin.

Richie Rump: Yes, so the question is, “Hi…” [crosstalk]

Erik Darling: It’s hi with an ellipsis afterwards, so it’s more like hi…

Richie Rump: “We have an Entity Framework application running on SQL Server 2014 Enterprise. I wanted to know if it’s possible to set the transaction isolation level at the connection string level.” So, I don’t believe so. I don’t believe that is the case. I’ve never tried it; I don’t recall ever seeing any parameters in the connection string for setting the isolation level. I’m pretty sure on EF6, the default isolation level is read committed snapshot; don’t quote me on that. I haven’t done much with transactions in Entity Framework proper. A lot of transactions that I’ve done in my career have been in queries themselves, and of the times I have used transactions, it’s been in code in Entity Framework. I don’t believe that is the case… [crosstalk]

Erik Darling:  I think I’ve seen people use, specifically for snapshot isolation, a separate connection pool that is set up to use that isolation level rather than go in with a default…

Richie Rump: Yeah…

Erik Darling: But I don’t know how to do that because I’m dumb at Entity Framework.

Richie Rump: Yeah, and you could totally do that. Say hey, for these queries, go and run this isolation level, right, but you need to know what those queries are. It’s not going to be a [crosstalk]… Now I could totally be wrong, got to Stack Overflow, ask the question there, that’s probably a better place for it. You could go to, you know, #EFhelp. I think Julie Lerman monitors that one. So as far as I recall, I don’t think you can, and if you need that kind of stuff, you need to change code. So there is not an easy, oh just let me change a connection string and not the application, I’m pretty sure you’re going to need to change the application.

Erik Darling: Or just throw it out and start a new one; that’s what I’d do.

Richie Rump: Listen, Entity Framework, it’s not broken, the way we use it as developers could be broken.

Erik Darling: Is usually horribly broken…

Richie Rump: Yes, and it’s just – there’s nothing different than what we see with SQL Server. We see all these bad implementations and we see all the breaking of best practices and we see all this crazy stuff going on, especially here at Brent Ozar Unlimited, because we see all these different things. And maybe you’ve actually walked into a new organization and you say, why would you do this and why is your backups here and where are your logs? Well it’s the same thing in developer land, you know, we just don’t understand our tools and we just start throwing things out there – and, hey look it works on my machine, it must work out there in production.

Erik Darling: Well the worst part about it is the overwhelming pressure to ship often gets in the way of doing things correctly, designing things correctly, you know, making sure that you have the right query, saying, okay well maybe this isn’t a good idea to run this 10,000 times a minute, perhaps we should wait three or four weeks before we ship this and invest in some sort of caching layer for this query instead. But that sort of stuff all gets shoveled by the wayside because you have some jerk sitting up top who wants you to ship, ship, ship to make someone happy.

Richie Rump: Let’s get it out, we’ll mark it as technical debt, we’ll deal with that down the road, just go, go, go. And the problem may be bigger than you think because, well, we don’t think very hard on the problem and what could happen down the road. So we don’t take a look at the risks and do a risk assessment and say, hey if we don’t do this it’s going to cost us this much money down the road because we don’t equivalate risk to dollars. And that’s, you know, a lot of problems with projects that I’ve seen is that no risk assessment to dollars; but it’s hard to do, frankly.

Erik Darling: The thing about technical debt is, it is a lot like regular debt and everyone wants to spend more instead.

Richie Rump: You know, the other thing about technical debt is that it’s very hard to quantify, right. I can’t put a dollar amount to it because it’s all – ooh it’s technical debt, you know, and how much will that cost to fix and how much would that cost to throw it down the path; nobody does that. Nobody says, oh if we make this technical decision, how much will that cost us in the long run? Because we don’t equate dollars to technology outside of, I need to purchase something.

Erik Darling: It’s like, you know, how much hardware can I mortgage to pay off my technical debt? But if I buy 512 more GB of RAM, the problem goes away. Well alright, guess I’m going to buy…

Richie Rump: You know – and from a developer perspective, there are certain things that we could do to, kind of, alleviate some of that, right. I write unit tests, okay, if I change something it will let me know if that is, if I have the right test and all that other good stuff, but even writing something simple as, you know, in 2017, as writing unit tests, that gets kicked on the curb too as, oh we’ll just do that later. And well, now we’ve got 15 people testing this thing, you know, and we make one change and everything breaks, you know. Well, that’s what you’re going to get.

Erik Darling: Yeah, well I guess for me, the thing is, it’s like okay, you want to write all these unit tests, but man, with a database where you have to have so many things to come up with a condition, or to reenact a condition, to make sure – it’s tough for some things. There are some really hard conditions to replicate. I’m not saying all of them and I’m not saying there’s not some laziness behind it, but there’s this perfect storm of, okay how do I replicate this particularly low memory condition with these parallel queries doing this? And figuring out how to write a single repeatable test for that once just to unit test this one thing over. It’s like, man…

Richie Rump: Yeah, unit testing queries is exponentially harder than unit testing code. It’s because we’re dealing with data, now I got to worry about setting up my data, I got to worry about running and I got to worry about what my result is. And doing all that is so much harder than saying, I’m just going to go ahead and fake this and mock this and do this and now I’ve got a result.  When I’m writing unit tests, and frankly I’ve never done it with queries because it is so stupid hard [crosstalk]… I was on one project where I did it, and we actually had to write our own unit test, kind of, module to do that because there was no tool out there to do this kind of thing. But still, you had to load the data, run it, and then you had to see what the expectation is and you had to do that, and it took forever. It’s one of those things – I still follow the 80/20 rule with running unit tests.

Hey, if I could get 80% of this stuff and the other 20% is going to take me another day or two or whatever, I’m just going to punt on that stuff and I’m only going to get the stuff that’s going to get the main course on it, right. And let’s keep moving forward and if we run into a problem, I’m going to write a unit test for it and we’re just going to keep moving forward.

Erik Darling: Right, so it’s like the main goal is to get things working, and then you can catch the edge cases on down the line.

Richie Rump: Yeah, I don’t get paid to write unit tests, I write unit tests because I get paid to verify my functionality, not because I want all these unit tests to do that. Writing a unit test is not fun.

Erik Darling: No, it doesn’t look like fun. Like I have a thing to get to in GitHub where I need to write code or stored procedures that will inflict upon a server various warnings that BlitzCache and BlitzQueryStore will flag, and I’m dreading that because there are so many things – I’m going to be like –  a lot of these stored procedures are going to have to add and drop indexes or do something else awful or run for a long time to get these things to happen. Like there’s some stuff you see out in the wild where you’re like, oh I can easily write code that will catch that and I can do that on the spot because this condition exists right here where I need it, but recreating that condition sucks. Databases are hard. People should get paid more for working with them.

Richie Rump: I know, I know.

Erik Darling: You hear that, Brent? People should get paid more for working with data.

Richie Rump: I’m very happy, sir. May I have another day.

(Brent writes: get back in the data mine, peons.)

 

Can transactions help avoid deadlocking?

Richie Rump: We’ll wrap this thing up here. So here’s another interesting question, “Can begin and end transaction encapsulating DMLs help avoid deadlocking?”

Erik Darling: No, no it can’t. No, it won’t help with that. You will have…

Richie Rump: Why?

Erik Darling: Because you will have just have things wrapped in a BEGINTRAN and ENDTRAN, it doesn’t matter. It does not matter, it won’t help you. There is another question…

Richie Rump: That question fell flat. [crosstalk] I thought there was more of a story there, but I guess not…

Erik Darling: No, no, no BEGINTRAN and ENDTRAN, all that does – I mean, that will isolate a single transaction, it’s not going to help you with other transactions though.

 

I have this problem with Idera…

Erik Darling: Let’s see here, “Idera SQL monitoring tool and services…” Dude, open a ticket with Idera. Do I look like Idera tech support? You pay them money, probably…

Richie Rump: Now that you mention it…

Erik Darling: You probably pay them money; you should open a support ticket with them. I have no idea why Idera would stop working. I have never used Idera, so I couldn’t tell you there. Couldn’t tell you on that one.

 

How should I test storage performance?

Erik Darling: Let’s see here, Uday wants to ask, “Do you have any recommendations on how to test storage performance?” That’s a good one. CrystalDiskMark is a good tool; it’s free. And also DISKSPD is a good tool as well. Those are the ones that I would go to there.

Richie Rump: So is there – I’m going to cut you off man, just cut you off. Is there an asset management tool besides MAP that reports on SQL Server versions… [crosstalk] Oh, look at that, now you cut me off that I cut you off.

Erik Darling: Yeah. [crosstalk]

Richie Rump: Don’t cross the streams, Ray…

Erik Darling: That’s a bad idea. So I think Redgate has some stuff like that. Redgate has some tools that do that. They’re not free though, at least not for more than two weeks; or not if you want to get all sorts of stuff out of them. I don’t know the names of all the Redgate tools, because they have a new one like every week and I’m not sure what they do past, like, SQL Prompt. There was SQL Octopus and SQL Lighthouse and SQL Crack… I don’t know, SQL Titanic – but they have tools out there that will do server discovery, I want to say, so I would go and check out “el Red Gatto,” the red cat.

Richie Rump: I know Kendal Van Dyke wrote a tool in PowerShell that does some of that stuff…

Erik Darling: I want to say I tried to use that at my last job to find stuff out when I first started and I did not have good luck with that. I don’t know if he keeps it up to date.

Richie Rump: It’s called SQL Power Doc, that’s what it’s called. Try it, it’s free. I mean, it’s only going to cost you your time, right. And if it gets you what you want, then you can move onto some of the paid versions – or if it doesn’t then you go onto some paid versions. I know he put a lot of work into it and I remember him talking about that like way back, I’m doing this power thing… And I’m like…

Erik Darling: Yeah, I mean, I just wonder what SQL MAP does differently than PowerShell. Like it probably queries the same things, you would think.

Richie Rump: Yeah, I don’t know. I know that – he was a consultant at the time and he’s  moved on to Microsoft since – but he was a consultant and he would go into environments and he would have no idea where any of the server are or what is actually running, so he wrote this thing to actually go and find all these servers and figure out what’s on them, where they’re running and doing all this stuff. So if you’re looking for something like that, you know, that may be something for you. Try it and let us know.

Erik Darling: I mean, especially because he went to work for Microsoft, I’d be pleasantly surprised if it was still being worked on, still in development and all that stuff.

Richie Rump: What are you trying to say?

Erik Darling: I’m trying to say that Microsoft… [crosstalk]…

Richie Rump: Ready to be updated? Yeah, I know that, thank you, thanks, Erik…

Erik Darling: Yeah, we need to get those [segment] eliminated things from Columnstore in there because there’s a massive rush to the Columnstore mall, next to the Columnstore store. Everyone’s buying into that one.

Richie Rump: I’m not. Let them go, let them go.

Erik Darling: Alright, fine. Anyway, we’re at 12:46…

Richie Rump: Yeah, baseball’s already started so we’d better [crosstalk] so we can watch the baseball…

Erik Darling: Goodbye everyone, thank you for joining us, we’ll see you next week, hopefully, if I ever get some sleep. Adios.