In this week’s Office Hours Q&A, we talk about the pains of patching lots of SQL Servers, how to move databases from one drive to another with minimal downtime, the difference between NOLOCK and READ UNCOMMITTED, and assign Doug all future calls for Dynamics tuning.
SQL Server Agent is a job scheduler.
When we first get started managing SQL Server, we usually use it for backups, index maintenance, DBCC CHECKDB, and other common maintenance tasks.
And then one day, we say, “Hey, I need this T-SQL script to run once a day.” We add an Agent job for it. It’s a rare occasion, we think. Not really a big deal.
The next thing you know, the entire business’s lifeline depends on dozens – or heaven forbid, hundreds or thousands – of intricately interwoven and ill-documented Agent jobs. Nobody knows what they do – they just have to run.
Except many of them don’t. They fail for one reason or another, but then they work the next time, and we just hope and pray everybody ignores it.
Eventually, some of them start failing permanently. When we try to figure out what’s going on with them, we ask around about who wrote the script, and it always turns out it was some guy who left the company months ago. Nobody knows how to debug his stuff, or if it even matters. We’ll just leave it in place and maybe it will start working again.
Don’t let yourself get to this place.
To make your life easier, keep application-logic Agent jobs out of your production SQL Servers.
If someone needs application logic, and it has to be an Agent job, it doesn’t actually have to be on the production database server. Give them their own Standard Edition VM where they can go to town, creating any Agent jobs they want. However, those users are 100% responsible for managing the success and failure of their Agent jobs because these are applications, not databases. There are no databases on this server – or if they are, they ain’t backed up or production-quality. This server is for jobs that can point at any SQL Server.
In addition, give them a VM in the DR environment where they can sync their jobs. Source control and deployment is totally up to them.
This way, it lets you focus: if there’s a failing job on the production SQL Server, you own it.
To help you pull it off, consider using my SQL Server support matrix. It sets clear expectations about what you’ll allow in dev, QA, production, and mission-critical production boxes.
As if there weren’t enough reasons
In my last blog post I talked about different things that cause plans, or zones of plans, to execute serially. One of the items I covered was computed columns that reference scalar functions. We know that they’ll make queries go parallel, but what about other SQL stuff?
Oh no my index is fragmented
If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.
That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.
CREATE FUNCTION dbo.PIDMultiplier (@pid int) RETURNS INT WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING AS BEGIN DECLARE @Out BIGINT; SELECT @Out = @pid * 2 RETURN @Out; END; GO ALTER TABLE [dbo].[PostLinks] ADD [Multiplied] AS dbo.[PIDMultiplier]([PostId]) PERSISTED
For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.
ALTER TABLE [dbo].[PostLinks] REBUILD WITH (ONLINE = ON) ALTER TABLE [dbo].[PostLinks] DROP COLUMN [Multiplied]
Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.
Parallel, sans computed column:
But there’s a bigger fish in the pond
Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.
Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.
The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.
With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.
EXEC [dbo].[sp_BlitzTrace] @SessionId = 61 , @Action = 'start' , @TargetPath = 'c:\temp\' , @TraceParallelism = 1 , @TraceExecutionPlansAndKillMyPerformance = 1 EXEC [dbo].[sp_BlitzTrace] @Action = 'stop' EXEC [dbo].[sp_BlitzTrace] @Action = 'read'
With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.
DBCC CHECKTABLE('dbo.PostLinks') WITH NO_INFOMSGS, ALL_ERRORMSGS ALTER TABLE [dbo].[PostLinks] ADD [Multiplied] AS dbo.[PIDMultiplier]([PostId]) PERSISTED
Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.
So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.
ALTER TABLE [dbo].[PostLinks] ADD [Multiplied] AS [PostId] * 2 PERSISTED
Of course, those online index rebuilds running single threaded might be a blessing in disguise, if you haven’t patched SQL recently.
I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.
Thanks for reading!
In this week’s Office Hours Q&A, the entire team avoids spinlocks, then discusses the new Cardinality Estimator, the virtual inserted/deleted tables, monitoring tools, restoring single tables, DBA Viagra, and more.
When you want to know why the server’s slow, you can ask Brent. sp_AskBrent® checks a lot of DMVs to find common causes of server slowness.
When you turn on @ExpertMode = 1, you can see additional information like file stats, wait stats, and Perfmon counters.
To log those to a table permanently, check out the @Output parameters. In this example, I’m writing all of the working tables to the DBAtools database:
EXEC dbo.sp_AskBrent @OutputDatabaseName = 'DBAtools' , @OutputSchemaName = 'dbo' , @OutputTableName = 'AskBrentResults' , @OutputTableNameFileStats = 'AskBrentResults_FileStats' , @OutputTableNamePerfmonStats = 'AskBrentResults_PerfmonStats' , @OutputTableNameWaitStats = 'AskBrentResults_WaitStats'
That creates the tables if they don’t already exist, and then adds the result sets each time it runs so you can track performance over time.
Why was the server slow yesterday afternoon?
When it’s time to query the data back out, you can add the @AsOf parameter with a date & time to see the main result set of alert results as of that moment in time:
That returns results within 15 minutes of either side of your time so you can see if there were any problems leading up to (or after) that moment. It’s really useful when someone says the server was slow last night.
However, that only shows the basic result set of sp_AskBrent® – not the detailed file/Perfmon/wait stats results. For those, you’re best off querying the tables directly for trending purposes.
v20: You Can Query Your File/Perfmon/Wait Statistics Over Time, Too
When you use the @parameter (or PerfmonStats or WaitStats), the latest version of sp_AskBrent® automatically creates a matching view, too, with a suffix of _Deltas. For example, if your results are in DBAtools.dbo.WaitStats, then sp_AskBrent automatically creates a view called WaitStats_Deltas. Querying that view will give you trending data over time because it automatically matches up samples for you to get running averages.
This way, if you run sp_AskBrent® in an Agent job every 5 minutes, you can track your SQL Server’s statistics over time. You can query it back out with SSMS, or your favorite reporting tool.
You can grab sp_AskBrent® along with all our other scripts in our handy download pack. Enjoy!
If you’re a full time production database administrator, you need a staging environment.
This is where you test infrastructure tasks like failing over mirroring, doing a disaster recovery role swap, cluster quorum changes, log shipping configuration, and patching. You need to test this stuff repeatedly so that when you’re under pressure in the production environment, you feel comfortable and confident. Bonus points if you use this environment to build checklists and screenshots for rarely-done tasks like recovering from corruption.
You can’t use the development or QA environment because many of these tasks will take the entire environment down – sometimes by design, and sometimes by accident. If you’re in the middle of testing a patch at 9:30AM, and something breaks, and you’ve got a meeting starting at 10AM, you don’t want to have the entire developer team sitting idle because their SQL Server is down. You want to be able to leave this environment in a broken state for hours or days without feeling clock pressure.
Staging is the DBA’s development environment.
You’re not going to use it for load testing or performance tuning, just rehearsing and testing infrastructure changes you might make in production. As such, the hardware quality and quantity doesn’t have to matter, but it needs to be as logically similar as possible.
For example, if your production environment consists of a 3-node production failover cluster and a 2-node cluster in DR, with log shipping across the two, then you’re going to want at least one cluster, plus a separate server to log ship to.
The individual nodes can be low-powered boxes – think single-CPU desktops with 16GB RAM and a couple of $250 1TB SSDs. (You want SSDs because you’re going to be doing a lot of backup/restore on these.) In theory, they can be VMs too, but often sysadmins freak out when you start asking for several terabytes of shared storage space for a staging environment. For $250, I can just avoid that argument altogether.
Use the $50 SQL Server Developer Edition, and this whole thing costs you less than a grand.
Step 1: get a job as a developer or a sysadmin.
Step 2: stand near the database.
Step 3: wait.
Almost every DBA I know got their start this way. Sooner or later, something went wrong with the database, and management grabbed the nearest person and yelled, “You! You look smart and helpful! Can you figure out how to get this thing back online?”
This person then spent more and more time doing stuff related to the database until they began calling themselves an Accidental DBA. At that point, the acronym DBA still wasn’t on their email signature, and they still didn’t feel comfortable using that acronym by itself. They were still a developer or sysadmin.
But wait a little longer, and that person’s next job – either at the same company, or another – was full time DBA.
There are no shortcuts. Here’s why.
Almost everyone who isn’t a DBA thinks they can somehow shortcut this process. They think they can present a near-empty resume and say, “I’ve learned a lot about databases by studying them and playing around with them. Give me a job managing your databases.”
Thing is, when companies have enough databases to hire a full time DBA, they’re saying, “Our data is valuable enough to warrant a full time professional watching over them.” That means they don’t want someone without real-world experience. After all, if they have that many databases, they likely also have a lot of developers and/or sysadmins, and they’ve likely been already filling the Accidental DBA role. The company isn’t likely to hire you in as a full time DBA if they have people on staff with more real-world experience than you.
Certifications aren’t going to get you there, either, because the Internet is chock full of braindumps – sample questions and answers that are sometimes from the actual tests themselves. Seasoned IT managers understand that certifications are icing, not cake – they don’t prove that a candidate can do a job. Certification tests aren’t even related to the job tasks that IT workers perform, as I ranted about on the Away From the Keyboard podcast.
If you want a DBA job, start with the job you have.
Stand near the database server. Look smart. Be helpful.
And above all, remember that the people who will hire you for your next DBA job are already around you.
This year, we’re trying something new: every Monday, we’re going to show you one module out of our online training classes.
It’s the whole module, not a sneak peek, and it’s available the whole week. But you’d better not put it off – this post will disappear the following week, and get replaced by a new post.
This week, it’s Brent Ozar explaining storage tiering with a table of SQL Server bloggers:
UPDATE JAN 24 – EXPIRED!
(If you don’t see the video above, you’re reading this somewhere that doesn’t support video embedding. You’ll need to read the post on our blog.)
Wanna see more? Check out our Virtualization, Storage, and Hardware for DBAs course. For $299, you get 18 months of access to stream it on your desktop, laptop, tablet, or phone.
In my last post, I explained the kind, gentle way to talk people out of the SA account.
Today, I’ll describe a time when Option 1 didn’t work, and I had to get ugly.
Power User: “You can’t change the SA password. It’s embedded everywhere. Everything we have relies on it, and we all use it for all kinds of processes.”
Me: “I see. Alright, well, talk to you soon.”
(The next day)
Power User: “EVERYTHING IS DOWN! THE SA ACCOUNT PASSWORD ISN’T WORKING! DID YOU RESET IT?”
Me: “Of course not. You told me not to.”
Power User: “THEN WHO DID IT?”
Me: “Oh, I have no way of knowing. Anyone who uses the account can change the password with the ALTER LOGIN command. And you said everyone has it, right?”
Power User: “YES, BUT…CHANGE IT BACK. NOW. SET IT TO P@SSW0RD1 THE WAY IT ALWAYS WAS.”
Me: “Sure, but I can’t do that kind of thing without a ticket. Just create a help desk ticket, and I’ll set it to whatever you want.”
(The next day, after the password was reset)
Power User: “EVERYTHING IS DOWN AGAIN! THE SA PASSWORD GOT RESET AGAIN!”
Me: “Oh, my. That’s unfortunate.”
Power User: “YOU ARE DOING THIS AREN’T YOU?”
Me: “No, but I’m worried because at this point, if it’s embedded all over the place, someone might have scripted out the ALTER LOGIN command. They might be doing it programatically. It might happen again, at any time. Right when you least expect it.”
Power User: “I HATE YOU!”
Me: “Well, we certainly can’t have that. How about I give you a nice, new, separate account of your own, not SA, and you start using that instead? No one else will have the password but you, and that way you won’t have to worry about anyone else changing it.”
Power User: “NO, I JUST WANT YOU TO RESET THE….ACTUALLY, YES, I SEE WHAT YOU DID THERE. GIVE ME MY OWN ACCOUNT.”
So you’ve got a bunch of people using the SA account all over the place. It’s hard-coded into connection strings, embedded in linked servers, and the password is on post-it notes throughout the data center.
Step 1: Create an alternative SA account.
In a perfect world, you’d create a minimally-privileged AD login that only has limited access to specific databases.
However, when you’ve got a toddler running with scissors and razors, sometimes you’re happy just to get the razors out of their hands first, and then you’ll work on the scissors next. One step at a time. Preferably not running.
For now, create another SQL account with DBO permissions on all of the databases involved with the application. (If you’re dealing with multiple different tenants on the same server, give them each their own SQL login.) Let them be complete owners of their databases for now.
Step 2: Explain your concerns to the stakeholders.
If several different departments have databases on the same server, and the SA account is being used, email all of the departments to explain what’s going on:
Hi! Your application, ___, lives on SQL Server ___. During security checks, we discovered that ___ uses a highly privileged account. This means they can drop your database, delete your data, shut down the SQL Server, or leak all of the data to the public.
I’m totally sure they would never do this on purpose – but I’m worried about accidents. I want you to be aware that as long as they’re using this account, there is nothing I can do to prevent accidents like that.
If you’re not okay with that risk, I’ve got an easy plan to secure the server with no disruption to the end users. Are you interested in hearing more about that?
Most stakeholders will hit the roof when they read something like this, and they’ll become your political ally to help get things done. They’ll practically demand the other users to let go of the SA account – but to be ready, you
Step 3: Give the users the new account – and an SA expiration date.
Armed with the political support of your stakeholders, send an email to the SA-equipped users, and copy your stakeholders:
Hi! We’re going to be changing the SA account’s password on YYYY/MM/DD. To prepare, here’s a new account that you can use for your applications: username ___, password ___. That account has full owner permissions on all of your databases.
I know changing your connection strings will take some time, so we made sure to set that SA date two months from now to give you plenty of time to prepare. On the password change date, I’ll be right here in my office for four hours after the change to help you troubleshoot any problems that arise.
Inevitably, you’re going to get grumbles from the users who have to make the change. They may even say they have to be system admin on their SQL Server. Be prepared to offer them options:
- Postpone the transition until YYYY/MM/DD, or
- Move to your own SQL Server, thereby avoiding the risk for the other stakeholders, or
- Get the other stakeholders to sign off that they’re okay with the continued use of SA (but at the same time, they’re also okay with me no longer being on call to support this server, because I can’t support servers where non-DBAs have SA permissions)
That last one takes the most backbone – you have to stand up for yourself. There’s absolutely nothing wrong with the business letting lots of people be SA on the server – as long as they’re in the on-call rotation with me.
Step 4: Only people in the on-call rotation get SA access.
Not along with me – no, in the rotation.
I’ll gladly take more people in the SQL Server on-call rotation with me. And I’m perfectly willing to present that to the stakeholders. The conversation inevitably goes like this:
Me: “OK, cool, so you’ll be in the rotation then? You’re on next week, and I’ll be off. If jobs fail, backups fail, cluster goes down, etc, it’s on you.”
Developer: “No, wait, what? No, I don’t wanna be on call for SQL Server. I’m a developer – I don’t know how to troubleshoot that stuff.”
Me: “No? Are you saying you don’t know how to manage the database? But you want the rights to do it?”
Stakeholder: “Yeah, no, let’s take that SA permission away then.”
The conversation doesn’t always go that way, and in my next post, I’ll explain Option 2.