This week Brent and Richie discuss the latest big Intel CPU security flaw, distributed transactions, rebuilding indexes, SQL Server health check, GDPR regulations, Availability Groups issues, improving server performance, patching for instances that share DBs from multiple applications, using heaps, and more…
You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.
Enjoy the Podcast?
Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes
Office Hours Webcast – 2018-01-03
Brent Ozar: Michael Tilley asks, “Any interesting holiday SQL Server, or other, reading?” The big thing that’s going on right now, and I should pop my web browser open for it, is the register – we’ll go get the register. So there’s a really big CPU bug going out – oh, and I don’t have an ad-blocker on this VM, so of course, I’m not going to find it. Let’s see here – so Intel CPU flaw… There’s a problem with – I don’t really want to click on the Guardian, but it’s the top one up there, so I guess that’s what I’m going to take. There’s a huge security flaw that got found in all recent – like over the last ten years – Intel processors. They are Microsoft, Linux, Apple, everybody working on patches as we speak. But it’s going to involve a hit of, say, 20% to 30% max on system calls from applications.
So people are worried about – the classic example is databases that constantly have to write to the transaction log, that that could get 20% to 30% slower. So, that’s been the big thing that I’ve been reading about lately; kind of interesting.
Richie Rump: Woo slower databases. Give us a call; we can help you with that.
Brent Ozar: Man, sadly we can. And it’s funny because you think about things that you don’t usually have to worry about, and people are like, “Oh my god, now our cloud prices will suddenly get 30% higher.” And I don’t think that they will. I think that the cloud vendors are going to suck that up in order to keep you in the cloud. Otherwise, just one of the vendors will do a price cut while everybody else tries to do a raise, and then you would just consider switching over.
Richie Rump: Yep. The other news that I saw was someone had wrote – I don’t have a link or anything like that. Again, I don’t know where my car is and I’m sitting here at home. Someone did a really great write-up about Docker and essentially said that this was like the year that Docker failed and kind of went through all the bullet points of what happened this year with Docker and the company. Not necessarily bashing the technology, but bashing the company in itself; and that was a super good read. I think I re-Tweeted it, so I don’t know, go look at my Twitter if you’re really interested, and scroll through all the older crap that I’ve been Tweeting about.
Brent Ozar: It did, it sounded like the fumbled a lot of political things really badly. The technology is neat. I just started last week using Docker on my Mac Pro at home, just because all of the development that I do in testing, I start by restoring the Stack Overflow database or restoring a client’s database. So I’m like, at the end of the day, who cares? I don’t need persistence, so a Docker container works okay. But then the first I’m I tried to run troubleshooting, I ran into all kinds of problems. I’m like, how do you troubleshoot a headless Docker container? [crosstalk]
Richie Rump: I mean I go straight to folks like Cecil Phillip, who’s used it a ton, and things like that. I’m still trying to grock a lot of stuff. Because I know a lot of people love to put their dev tools in Docker containers and things like that, and I’m like – I don’t need that. I could just do an install on my computer and it’s going to lay there until I kill the machine. I mean, until I get a new machine and it’s there.
Brent Ozar: One of the questions asked just now – I should go look and see who the name was… garland says, “Any place suggestions to looking up setting up Linux containers with persisted storage on a Windows host?” Generally, I wouldn’t recommend Docker containers on a Windows host; especially not if you’re dealing with persistent storage. Just do a VM and that’s exactly what you’re looking for. But I do have good resources for a Mac. I don’t run Windows in a VM if I can avoid it, just in terms of a performance overhead. If you Google for Aaron Bertrand SQL Server Linux, he has a great specific walkthrough of exactly everything that you need to do in order to install it. So it’s really super nicely done. Otherwise I just really wouldn’t recommend it on Windows; it doesn’t make sense.
Brent Ozar: Rob asks, “On a Win 2016, SQL 2016 Always On Availability Group with three nodes, are distributed transactions supported between the nodes and necessary? Also, with a three node…” So distributed transactions are a real… with Always On Availability Groups. So if you search for SQL Server Always On Availability Groups distributed transactions, there’s a huge list of gotchas around differences between 2016 and 2017 whether databases can be in the same availability group or different availability groups.
This post will get you into all kinds of details here. Basically, 2017 supports distributed transactions for databases – 2017 supports this. 2016 did but only as long as the distributed transactions were going across two or more different databases on different AGs on different nodes. You couldn’t do a distributed transaction across two databases on the same node; so that’s kind of hairy there.
Brent Ozar: James May says, “Do you take submissions for Bad Idea Jeans posts?” That’s probably the only thing that we would take submissions for. We have a soft spot in our heart for Bad Idea Jeans with SQL Server. [crosstalk] And Richie sees it from being in the company chat room. We sit there and we just have the worst ideas all day long about how to do something.
Richie Rump: And you know – I’m usually heads down, right, so I’m typically behind the real time of Slack. And I raise my head, it’s like, what is going on? This is all terrible; stop.
Brent Ozar: And he gets to see highlights from client code. We’ll show something, like, “Check out what this does…” Why would you ever do that?
Richie Rump: Yeah, I think I was razzing you earlier because your comment on one of the clients was, “Oh, I bet you it’s missing indexes.” And I’m like, “It’s always missing indexes, Brent. It’s always that with you.”
Brent Ozar: It’s funny, yeah. And there’s some things that you can pretty much guarantee are going to show up during critical care. Just like there’s some things you can guarantee will show up during an Office Hours; Erland Sommarskog’s post Slow in the App, Fast in SSMS. It’s going to happen.
Richie Rump: Yeah, that’s seven minutes there, thanks.
Brent Ozar: Mike asks, “SPLA licensing – my manager’s talking about this but there isn’t much online about it. Do you know anything about this licensing model?” Yes. So if you are going to be a service provider – if you are going to sell access to SQL Server to other people, like if you’re going to have an application that uses SQL Server as a backend, people are going to query SQL Server directly – you’re going to be a service provider or software provider, you basically rent your SQL Server licensing by the month. Pricing is all totally down to what you negotiate with Microsoft. They publish a guide about it; resellers get involved. But generally, you have to go talk to a reseller.
The nice part that I love about it is because it’s every month, you can change capacity. Normally as a performance tuner, unless you’re like Richie and I working in the cloud, if you do a whole bunch of performance tuning and you cut your workloads, you don’t get a refund the next month. You can’t suddenly cut your bill. But with SPLA licensing, you can.
Richie Rump: That’s cool; I didn’t know they could do that.
Brent Ozar: Yeah, every single month it changes. So you can, all of a sudden, go bum rush with a brand new client and be lit up in the next month.
Brent Ozar: Terry says, “We have several SQL Server Enterprise Edition 2014 and 2016 servers with nightly scheduled index reorg maintenance. Should I forego reorganizing my indexes and just to rebuilds, considering we have Enterprise?” The big problem is transaction log activity. When you go rebuild indexes from scratch, you’re building a whole new copy of the object. It’s a logged activity, which means your log backups grow, your transaction log file may grow, depending on how quickly you take log backups. It’s more data that you send to your mirrors, Always On Availability Group replicas. So it will tend to just inflate the amount of transaction log activity. That’s the only big con I can think of.
I’m kind of known for saying you should ease up on the index maintenance, period, because it’s not like people really go, “Oh my god it’s suddenly faster now.” But if you have the luxury of a maintenance window to do it and you want to do it, it’s not so bad.
One other thing: it’s called online index rebuilds, but it’s not 100% online. At the end of the operation, it has to swap in the newly rebuilt index, and you can end up with a schema mod lock on that. So queries with NOLOCK actually get blocked behind that. So just be careful around that.
Brent Ozer: Daryl says, “I know I need to live in SQL Server health check…” Daryl, I think you’re missing a verb there. He says, “I don’t have a stress check process in place. What’s a good starting place to address replica timeouts in Availability Groups? I pushed and pushed on our storage guy and our vendor finally admitted we had saturated the SSDs.”
Richie Rump: When you finally get the storage vendor to admit anything, consider it a success. You have won the day; go home victorious.
Brent Ozar: Yeah, and I would want to know more about what specifically validated the – or what specifically he meant by saturated the SSD. Because this is kind of longer than something we could quickly address over Office Hours, I’d say – he’s saying, “I went through your SQL Server health checklist.” Yeah, so we really publish pages of stuff on how to do that. Then, to dig even deeper, that’s where our consulting comes in. So it’s kind of beyond what I could answer quickly.
You will notice, as the slide deck goes through; you’ll see a place to post multi-paragraph questions at Stack Exchange. That could be one place that you would start. Just make sure to zoom in as closely as possible and define your question. I’m not exactly sure what you mean by the question here. AG replica timeouts, be more specific about exactly what the error message is that you’re getting, what problem it’s causing, because it may not be that big of a deal. If you’re doing synchronous replicas between two places, the first thing I’d ask is how much unnecessary maintenance are you doing, like index rebuilds all the time? If you’re doing big huge chunks of index rebuilds in a synchronous AG, that’s a great reason to run into problems.
Brent Ozar: Oh man, Garland has a fun question. He says, “Referencing your post on the GDPR, was I correct in understanding that if there’s any data from the EU held in the DB, you need to be in compliance?” So this is pretty tricky. One of the things that causes a problem is, if you have an EU citizen – I should say first that I’m not a lawyer. I don’t provide legal advice. I don’t provide compliance advice. I’m only going to relay things to you the way I understand it and the way that our attorneys have come to understand it for us.
If a European citizen gives you data and you take it as some kind of data processor, you’re going to do something with it, you’re going to send them emails, you’re going to analyze their data, you’re going to build a report with their data – they have the right to be forgotten. The GDPR is this new regulation for Eu citizens where an EU citizen can contact you, as their data processor, and say, “I would like to exercise my right to be forgotten.” You have to delete all of their relevant data that you don’t need for your own legal purposes.
So for example, let’s say that I sell an EU citizen a piece of training. I have to keep some of that data to track that I sold him or her something for training, how much tax I paid, where he was at the time. But I have to track what I delete, what I keep, why I’m allowed to keep it – then, if that person goes and contacts the EU and files a complaint, the EU can ask for my audit records. And they have to be able to see everything that I’ve deleted.
It gets even trickier when I start to look at backups. So there’s some debate going on now about – it may even include having to delete data from my backups, or have some kind of auditable script that the instant a backup is restored, I can guarantee their data is expunged out of there.
Richie Rump: So even me thinking about that – and we don’t have a lot of systems here, we’re pretty small potatoes in stuff like that. So we’re using cloud-based systems all over the place, and even just looking at some of the newer database stuff, like in RDS Aurora, there’s an automatic backup. Where does it go? How do I have access to it and how do I even delete stuff inside that backup? There’s a lot of questions. For a small company like us, how would we even approach something like that?
Brent Ozar: Yeah, and then I also have to guarantee – like all of us have laptops, we have desktops. That means we also have to have some kind of written process where every employee certifies that they don’t have any copies of anybody’s private data – anything like that. And it’s easy to argue, well you should never use production data in development, for example. Well, it’s kind of easier said than done, especially when you’re as small as us and we do things like a mailing list., where we’ll keep lists of customers, what they’ve clicked on – so we can tailor content better for them.
You’ll know from our mailing list, our Monday morning newsletters, we track the ten most popular clicked on things and we send you out confirmations of those at the end of the year, “Hey, here’s what people really like to click on.” Well, that’s got people’s data in it. That means we got to be really careful with it. It sucks.
Brent Ozar: Daryl says, “A connection timeout…” Daryl’s following up with his Availability Group piece. He says, “A connection timeout has occurred from one replica to another; either a networking issue or an availability replica has transitioned to the resolving role.” Typically when I’ve seen that happen, it’s been that a replica is just completely overwhelmed, like the disk – solid state – not solid state, but usually magnetic storage – is frozen. It’s taking 15-second timeout errors, so it’s typically due to under-provisioned hardware.
Brent Ozar: Eric says, “I inherited a few SQL Servers with an IBM 7000 fiber channel SAN. All my drives are in RAID 5. Their SAN admin said that solid state drives are not available; what should I do next, my tempdb latency is horrible?” This is actually pretty easy. You just go get a couple of local solid state drives and you put them in the SQL Server. So if you’re running on bare metal – like rack mounted boxes, Blade Servers, et cetera, you can just put really cheap commodity SSDs inside the Blades or rack-mounted servers themselves.
If you’re dealing with virtualization then you don’t have that as an option. Everything’s out on the San and it just kind of is what it is. It’s one of those things that you make a note of for when users are complaining. Just because tempdb latency sucks, doesn’t mean users are complaining either. A classic thing, we see really slow tempdb all the time with end user systems, and it’s just with production servers. But it’s just that CHECKDB is running in the middle of the night and tempdb sucks then, but no one’s actually complaining. Not quite so bad.
So back on the GDPR stuff, I mean – for those of you who aren’t hip with that, what that ended up meaning was that we decided not to sell training classes to EU folks. It’s so hard because a person form the European Union can be standing on American soil using a Chinese credit card. How do you make for doggone sure that it’s actually someone from the EU? So what we’re working with, with the plugin vendor, is basically a little check box that, “Yes, I verify that I’m not subject to GDPR regulations.” Just to make sure that – we can’t ever erase all risk. We can’t make sure that somebody from the EU doesn’t send us an execution plan with personally identifiable data in it. At that point, we’ve just got to do our best. Paste The Plan is the great example of that, and it’s something that we just have to think of. And people do ask us to delete their data out of Paste The Plan from time to time, and we do that.
Richie Rump: And after we have in our instruction, say, we will not delete stuff from Paste The Plan, we still delete stuff from Paste the Plan. I think the other thing is the whole GDPR thing is so vague. Nobody really knows what it means right now. I think if, over time, they enhance it a little bit to where we know exactly where our risk is, we probably would take a look at that and maybe open that back up again. But because it’s so vague right now and it has such a huge risk for a small company like us, I agree with your decision just to cut our losses and be done with it until the EU can figure out what exactly does GDPR mean.
Brent Ozar: There will be court cases that set precedents. It will start to become normal about what you are expected to do, what you’re not expected to do. More third-party vendors, WooCommerce, Automattic will start selling tools that make it easier to be compliant with GDPR. And we’ll be knocking the door right back down to start letting European people get back into training again; as soon as we just have a better idea from the government what those court cases will look like – what we need o do to be compliant. I just don’t want to be the canary in that coalmine.
Richie Rump: I think there’s good laws and then there’s good laws poorly written. And I think this is kind of one of those things. It’s a good law but it’s just so extremely vague that, you know, what is even our responsibility as a corporation dealing with the EU? Nobody can tell us, and that’s part of the problem. We can’t quantify our risk, and whenever you can quantify – gosh, do I sound like a project manager? “When you can’t quantify your risk, then it’s an ultimate super high risk.” You know, so…
Brent Ozar: And especially when we quantify the reward side of it. It’s less than 5% of our revenue, so I was like it’s not really worth it. Michael Tilley adds, “by the way, GDPR enforcement doesn’t go live until 25th of May.” Yeah, but you don’t want to wait until then to try and become compliant. You’ve got to start laying your groundwork long in advance of that.
Brent Ozar: Bob says, “Do you guys have any advice on patching for instances that share databases from multiple applications?” SO he has like one SQL Server that hosts, probably, SharePoint, Dynamics, whatever. So cumulative updates come out every 60 days, and what I’ll do is go through the list of updates that come out in each cumulative update; all the hot fixes. We point and laugh at it in the company chat room. Like, incorrect results bug, bug that causes crashing. So what I’ll just send those out to is my application owners and say, “Alright everybody, here’s the risks. Are you cool with incorrect results coming out of your queries?” And then usually that’s pretty much the end of it right there. People go, “No, I’m not okay with incorrect results from my queries.”
Richie Rump: Then you should stop using NOLOCK.
Brent Ozar: Yes, oh god, so sad.
Brent Ozar: Next up, Paul says, “I was always under the guidance that if a table does not have a clustered index then rows being deleted will not free space in the file.” It depends. So if you have lock escalation – if your huge delete statement involves an escalation up to a table level lock and you’re not under RCSI, for example, you will get the rows cleaned up out of there. Erik Darling just published a blog post yesterday on this – yesterday or two days ago. So if you go to BrentOzar.com/blog. And today’s was Tara’s post on network issues or thread pool waits. If I scroll down a little further, here’s this post from Erik Darling: Heaps, Deletes and Optimistic Isolation Levels. So you can read that. He’s even got demo scripts to show you that you can walk through.
Paul continues, “I have not found this to be true on my table with two columns – PK is int.” Well, as soon as you say PK, I start to worry about – I wonder if you really do have a clustered index or if you have a non clustered primary key; which is fairly unusual.
Richie Rump: Yeah, by default you get the clustered index though.
Brent Ozar: Whenever you declare something as a primary key?
Richie Rump: Yes.
Brent Ozar: he says, “Why is this? Or better yet, when do you lose space in a table with no clustered index?” It’s just that SQL Server, by default, doesn’t de-allocate pages in a heap. And probably the easiest way to show that to you is just fire up a database and give you an example here. So I’m going to go grab a presentation script so that I can show you.
So here’s what we’re going to do. We are going to create a table that I’m going to call Heaps of Sadness. So this is a table with no clustered index. I’ll even zoom in a little so you guys can see the T-SQL bigger. I’m creating a table with an identity field. It’s not declared as a primary key. There’s no clustered index on this table. And then I’m dumping in about a quarter million rows from the table sysmessages. Whenever you want a whole bunch of garbage fast, sysmessages is one of the fastest ways to get it.
Then I’m going to go run sp_BlitzIndex – and the important thing out of sp_BlitzIndex is that it shows me the heap and the size of it. The size of this thing is about a quarter million rows and it’s 2.1GB in size. When I’m going to do next is go delete 1000 rows out of it 100 times. Basically, I’m simulating insert, update and delete activity going out across the – well, just specifically delete activity. I’m doing this 100 times. I’m not deleting the entire table in one fell swoop; just deleting randomly 1000 rows across 100 times each.
When that finishes I’m going to go through and run sp_BlitzIndex again and see how big Heaps of Sadness is. Remember, it was 279,000 rows at 2.1GB. Now it’s 179,000 rows at 2.1GB. So these pages are still allocated, even though in some cases there’s no rows on those pages. And when you think about what a heap is for, it kind of makes sense. Everything in SQL Server in heaps is stored in 8K pages. Well SQL Server wants to keep these 8K pages lying around so that you can go insert data into them again quickly.
Heaps are fantastic when you dump a bunch of data into a database and then yank all the data right back out again. Nightly staging tables for data warehouses is a great example of that; just much less useful when you’re trying to do insert, update and delete activity concurrently, like we’re doing here.
When this thing finishes, I’m doing these delete 200 times this time so that I can delete out that last 179,000 rows. And then when it finishes, we’ll go through and run sp_BlitzIndex again. You’ll already know, based on the spec of this demo so far, this thing is not going to be a zero space used for that table. So for those of you who are just learning that for the first time, does it make sense to have heaps? Sure, as staging tables in data warehouses where every day you dump all the data in and then you pull it all back out in one fell swoop. It makes perfect sense there.
So now, here I am with my heap, zero rows and 1.6GB in space. And I know what you’re thinking; who cares if it’s still got a lot of pages attached? It’s not like I’m going to shrink my database and reclaim those pages anyway. Well here’s why you care. If I turn on set statistics I/O, which Richie is intimately familiar with, and then I go run SELECT* from Heaps of Sadness – if I look at how many logical reads we had to do to execute this query, SQL Server reads 200,000 8K pages every time this query runs just trying to see if there’s any rows in Heaps of Sadness. “Are there any now? I better go check all 200,000 pages again. What about now? I better go check again.”
If you really want those pages de-allocated, you either need to do a delete with TABLOCK, you need to truncate the table – those are both good ways – or do an alter table rebuild. All those will get you your rows back.
Richie Rump: Space.
Brent Ozar: Space, exactly, get you your space back. It would be hilarious if they got you your rows back…
Richie Rump: “What, how – I deleted this data. Why is it back?” It’s kind of like me testing this update or tool yesterday, “Why won’t this update? I don’t understand.”
Brent Ozar: So that will be the last thing we’ll do before we break for today. Speaking of getting your data back, let’s call this My Table. I’m going to build a table variable and call it My Table. And table – we’re going to put in here ID, int, identity 1,1, person name, varchar(50)); insert into My Table, person name, values, Richie Rump. Then we’re going to add me into there as well. Copy, paste – so we’re going to put Richie and I sitting in a table variable – insert kissing joke here. SELECT* from My Table. Now, y’all know when I declare a table variable and I insert two rows into it and then I select them back out, I’m going to get both me and Richie. Watch when I do this. BEGINTRAN, and then rollback. I’m going to insert our rows inside a transaction, but then I’m going to roll them back and then execute my select.
When I execute this, I still get our data. Table variables ignore transactions. So you can do a BEGINTRAN, dump a bunch of stuff into a table variable, roll it back – it’s still in that table variable. Speaking of getting your data back out of nowhere, Itzik Ben-Gan has this great presentation where he calls it Bug or Feature. He shows you something and then you have to guess whether this is a bug or a feature. In many cases it’s kind of both, it’s just whether you’re aware of it ahead of time. If you’re aware of it ahead of time then it’s called a feature. So here is the perfect example for that… [crosstalk]…
This makes sense for those of us who do nightly warehouse data big loading stored procedures where I want to write lots of status messages into something. And then if my transaction explodes, I still want the status messages form the whole entire way, I can insert them into a table variable as I go along. Then, because they’re impervious to rollbacks, I can still get the stuff out of the table variable at the conclusion of my stored procedure. Somebody in here was asking about Bad Idea Jeans. There is your Bad Idea Jeans for the day. Well thanks everybody for hanging out with us at this Office Hours, and we will see you all next week. Adios.