In this week’s Office Hours Q&A, the whole team shows up – but Doug is the invisible man. We answer questions about hot-add CPU and memory, whether to sync jobs across Availability Group replicas, and how you get a single query to use more than 100GB of RAM.
It’s easy to get caught up in daily incidents, tickets, and special projects. Like a good scout, though, a core task in a DBA’s job is to be prepared. You have to set time aside in your schedule to:
1. Practice responding to corruption. Gail Shaw’s “Help, my database is corrupt, now what?” post should be on your bookmark toolbar. On your development server, stop the service, bust out a hex editor, and purposely break some of your database pages. Don’t try to figure out which pages you’re corrupting – act like your storage, and just trash random pages in the file. Then start the SQL Server back up, run DBCC CHECKDB, and follow Gail’s checklist.
2. Automate the restore of full and log backups. When problems strike and you have to restore a database, you don’t have time for the pain of clicking through GUIs. It gets even worse when multiple related databases are involved, or heaven forbid, all of the databases on the server. If you’ve got a third party backup tool, rehearse the complete rebuild of a database or a whole database server. If you’re using native backups, check out this MSSQLtips post on generating restore scripts from all the database files in a folder.
3. Set your monitoring tool’s thresholds. If you’ve got an Outlook rule to dump all of your alerts into a different folder, you’re losing the monitoring game. Spend time tweaking the thresholds so that you don’t get spammy alerts. The whole point of a good monitoring tool is that you get alerted when something is truly wrong, and you can take action before users figure things out. If an alert’s not actionable, it needs to go.
4. Take care of yourself. This job is about serving and protecting the data. When disaster strikes, you need to be practiced, ready, and in the zone. You can’t do that if you’re constantly fighting fires and working sixty hours a week. You have to set a hard stop, walk away, and recharge your batteries.
If you’re lucky enough to love what you do, then learning stuff and playing with company servers after hours can feel like a recharge. Sure, it’s even better than working – but never forget what you’re really working for. Work to live – don’t live to work.
Most of you are going to hate this
And TL;DR, there’s a script at the end of the post. But like The Monster At The End Of This Book, it’s worth it not to skip the middle.
There are about a billion but-what-ifs that could come into play. I can’t possibly answer all of those for you. But that’s not the point of this post, anyway! If you’re in a special circumstance, using some fancy features, or doing something utterly deranged to your database, this isn’t the post, or script, for you.
I mean really, unless the size of your log file is causing you some dramatic pain, leave it alone. You should probably go invent cold fusion if log file size is the worst issue in your database. Congratulations.
This is also a lousy place to ask me if you can shrink your log file. I have no idea how or why it got that size. There’s free space now because you’re using FULL recovery model and you took a log backup, or you’re in SIMPLE and your database hit a CHECKPOINT. No magic there. It may very well grow to that size again, so shrinking it could be a really dumb idea.
So what’s the point? Lots of people ask me this question: clients, Office Hours attendees, random passerby on the street who recognize me (even without my Robot). I usually give them the same answer and explanation, unless I have ample evidence that their fancy and/or deranged ways require a different estimate.
From the ivory tower
A good STARTING POINT for your log file is twice the size of the largest index in your database, or 25% of the database size. Whichever is larger.
If the largest object in your database is larger than 25% of your database, you are likely running some type of maintenance. Index rebuilds require the size of the object being rebuilt in log space. I usually rule of thumb twice that space, in case you’re doing anything else while you’re doing that maintenance, like ETL, reports, dragging data to and fro, purging data, whatever. If you’re only ever reorganizing the largest object, you may not need all that space. Are you sure you’re ONLY ever reorganizing that? I’ll wait.
But 25% seems so random!
Well, kinda. but you’re here for a starting point. If you’re not Super DBA and taking baselines and trending your database file sizes over time, random is better than nothing. It buys you some leeway, too.
- If you miss a log backup (maintenance plans got you down?)
- If you’re not taking frequent enough log backups (can I interest you in RPO/RTO insurance?)
- If you run other long/large transactions (SSIS won’t save you)
You’ll have a fair amount of room to do your dirty work. Most sane and rational people consider this to be a positive thing.
But what if my log file still grows?
Well, then you found out you need a bigger log file. Or you need to take log backups more frequently. Perhaps those hourly log backups aren’t working out as you planned, hm?
And if your log file never grows, you’ll look really smart. And you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.
Show me the script already
It’s all right under here. Don’t forget to change the USE statement. All sizes are in GB. If your database is smaller than 1GB, you’re one of those lucky DBAs who can take vacations and stuff. Go do that. Life is short.
If your database is under 1GB, and your log file is over 1GB, start taking log backups. I’m pretty sure you’re not.
USE [StackOverflow] --You'll probably want to use your own database here --Unless you work at Stack Overflow --No, I'm not writing this to loop through all of your databases ; WITH [log_size] AS ( SELECT TOP 1 SCHEMA_NAME([t].[schema_id]) AS [schema_name] , [t].[name] AS [table_name] , [i].[name] , [p].[rows] AS [row_count] , CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18, 2)) AS [index_total_space_gb] , ( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] , ( SELECT ( SUM([mf].[size]) * 8 ) / 1024 / 1024 FROM [sys].[master_files] AS [mf] WHERE [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] , ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024 / 1024 AS INT) FROM [sys].[master_files] AS [mf] WHERE [mf].[database_id] = DB_ID() AND [mf].[type_desc] = 'LOG' ) AS [current_log_size_(gb)] , ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024 / 1024 * .25 AS INT) FROM [sys].[master_files] AS [mf] WHERE [mf].[database_id] = DB_ID() AND [mf].[type_desc] = 'ROWS' ) AS [25%_of_database_(gb)] FROM [sys].[tables] [t] INNER JOIN [sys].[indexes] [i] ON [t].[object_id] = [i].[object_id] INNER JOIN [sys].[partitions] [p] ON [i].[object_id] = [p].[object_id] AND [i].[index_id] = [p].[index_id] INNER JOIN [sys].[allocation_units] [a] ON [p].[partition_id] = [a].[container_id] WHERE [t].[is_ms_shipped] = 0 GROUP BY SCHEMA_NAME([t].[schema_id]) , [t].[name] , [i].[name] , [p].[rows] ORDER BY [index_total_space_gb] DESC) SELECT * , CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)] THEN [ls].[largest_index_times_two_(gb)] ELSE [ls].[25%_of_database_(gb)] END AS [maybe_this_is_a_good_log_size(gb)] FROM [log_size] AS [ls] OPTION ( RECOMPILE );
I recently ran into a performance problem that had me scratching my head. How could a stored procedure’s plan have changed, and yet it not be reflected in sys.dm_exec_procedure_stats?
So here’s what happened and what I knew:
- 7:45am: Users were complaining that a page was timing out
- 8:00am: Developer emailed the offending stored procedure and input parameter values
- 8:30am: Developer reports that the page is fast again
- 8:35am: I start troubleshooting even though it’s already fixed as we need to figure out what happened
- UPDATE STATISTICS job runs every 30 minutes
- WhoIsActive data saved every 30 seconds
I just knew that the UPDATE STATISTICS job caused the stored procedure to be recompiled and get a better execution plan. But I needed to confirm it.
I first ran the stored procedure with the provided input parameter values and checked the execution plan.
I ran DBCC SHOW_STATISTICS against each of the indexes in the plan and checked the Updated column in the first result set. One of the indexes did get updated.
Example DBCC SHOW_STATISTICS using StackOverflow database:
DBCC SHOW_STATISTICS(‘dbo.Posts’, ‘PK_Posts__Id’);
I next checked when the plan was cached and which plan was in cache:
SELECT ps.cached_time, qp.query_plan, * FROM sys.dm_exec_procedure_stats ps CROSS APPLY sys.dm_exec_query_plan(ps.plan_handle) qp WHERE [object_id] = object_id('SomeStoredProc')
It returned 2 rows: one for the application executions and one for my adhoc execution. I could tell which one was mine as execution_count=1.
Often times, my plan will be different than the app’s plan, but this time they were the same. If they are different though, you can mimic the app in SSMS by using its SET options. To get those options, grab the options_text value of the app’s username for the Login event in an XE session (use the System Monitoring\Connection Tracking event session template). Run those SET options in an SSMS query window and then run the stored procedure in that same window. You should have the same execution plan as the application.
The app’s plan was cached the day before. But wait a second! My assumption was that it had recompiled this morning due to the updated stats.
I dug into the WhoIsActive data:
SELECT TOP 100 query_plan, * FROM DBA.dbo.WhoIsActive WHERE CAST(sql_command AS VARCHAR(MAX)) LIKE '%SomeStoredProc%' AND collection_time BETWEEN '2016-01-04 07:30:00.000' AND '2016-01-04 08:30:00.000'
It did have a different execution plan that morning!
But why would the cached_time value show the day before?
I scratched my head for a bit and then took to Twitter (#sqlhelp). Kendra replied and said to check the SQL statement’s compile time, not the proc’s compile time.
SQL Statement Cached Time:
SELECT * FROM ( SELECT total_worker_time/execution_count AS [Avg CPU Time] ,creation_time AS cached_time ,SUBSTRING(st.text, (qs.statement_start_offset/2)+1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) AS statement_text ,* FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st ) t WHERE CAST(statement_text AS VARCHAR(MAX)) LIKE '%SELECT blah%' -- <--Something to help you find the query ORDER BY [Avg CPU Time] DESC;
And there it was. The cached time was from that morning.
I’m unclear why the cached_time didn’t change in sys.dm_exec_procedure_stats when the stored procedure’s plan did change when the SQL statement’s plan changed as a result of updated stats.
Brent says: even if you’ve got a monitoring tool, it can be helpful to log sp_WhoIsActive results into a table so that you can go back for troubleshooting later. Just make sure to set up a job to clean out that table periodically.
This year, we’re trying something new: every Monday, we’re going to show you one module out of our online training classes.
It’s the whole module, not a sneak peek, and it’s available the whole week. But you’d better not put it off – this post will disappear the following week, and get replaced by a new post.
In this week’s free video training sample, Kendra Little grills you with 25 core DBA skills questions, 22 infrastructure/platform questions, and 23 performance tuning questions:
(If you don’t see the video above, you’re reading this somewhere that doesn’t support video embedding. You’ll need to read the post on our blog.)
The full course covers resume-building for DBAs, 5 things you should always do in a DBA interview, how to get in front of hiring managers, and much more. Go check it out, and if you’re an Everything Bundle subscriber, you can start watching the training course now.
In this week’s Office Hours Q&A, Erik, Tara, Doug, and the new guy Richie Rump answer fast-paced questions about logging Perfmon counters to a table, Standard Edition’s 4-socket licensing limit, whether odd or even MAXDOP numbers make a difference, and – one of our favorite questions in a long time – is it a good practice to cut your SQL Server’s max server memory in half every two hours just to, uh, clean things up?
I love living in the city
Blog posts about people’s favorite data sets seem to be popular these days, so I’m throwing my hat in the ring.
NYC has been collecting all sorts of data from all sorts of sources. There’s some really interesting stuff in here.
Another personal favorite of mine is MTA turnstile data. If you’re a developer looking to hone your ETL skills, this is a great dataset, because it’s kind of a mess. I actually had to use PowerShell to fix inconsistencies with the older text files, which I’m still recovering from. I won’t spoil all the surprises for you.
Of course, there’s Stack Overflow.
You can’t go wrong with data from either of these sources. They’re pretty big. The main problem I have with Adventure Works is that it’s a really small database. It really doesn’t mimic the large databases that people deal with in the real world, unless you
do some work run a script to make it bigger. The other problem with Adventure Works is that it went out of business a decade ago because no one wanted to buy yellow bikes. I’ve been learning a bit about Oracle, and their sample data sets are even smaller. If anyone knows of better ones, leave a comment.
Anyway, get downloading! Just don’t ask me about SSIS imports. I still haven’t opened it.
Thanks for reading!
I am not a proponent of shrinking databases, but sometimes you have to because your momma said to. Sometimes that database is tempdb.
It used to be that we were warned against shrinking tempdb because it could cause corruption, so your only recourse was to restart the SQL Server service. Paul Randal let us know that this is no longer a problem.
LET’S SAY YOU HAVE TO SHRINK TEMPDB
Like your life depended on it. Or perhaps you needed the alerts to stop.
If a user ran an atrocious adhoc query that caused your tempdb to grow so much that it caused your disk space alert to fire and:
- you needed that alert to stop
- the storage team is not going to give you more space
- the user promised to never do that again
So you try to shrink tempdb, but it just won’t shrink.
Try clearing the plan cache:
And then try shrinking tempdb again.
I came across this solution recently when I had to shrink tempdb. I tried shrinking each of the 8 data files plus CHECKPOINTs, repeatedly. It would not budge. I almost threw in the towel and emailed out that the space issue would be fixed during our next patching window, but then I found David Levy’s reply. DBCC FREEPROCCACHE worked like a charm.
I’ve set up Mirroring about a billion times
I’m not bragging about that. I’d rather say that I set up a billion AGs, and not one of them ever failed. But then I’d be lying to you; those things fail like government programs. One thing I’d never done, though, is set up Mirroring with a Witness. I never wanted automatic failover, because it’s only one database at a time. If for some reason one database out of all that I had mirrored ever turned Ramblin’ Man and failed over to another server, there would understandably be some application consternation. Not to mention any maintenance and internal operations. They don’t react well to sudden database unavailability.
Of course, doing anything for the first time is horrible. Just ask my second wife.
Here’s where things got awkward
I have my databases! This is my top secret development environment. Stack Overflow is in an AG, and I had set up two other Mirrors: one synch and one asynch. I wanted to have a variety of setups to test some scripts against.
Alright, let’s set up Mirroring…
This is so easy. Seriously. Why doesn’t everyone do this? Why do you complicate your short, short lives with Availability Groups? Are they AlwaysOn? Are they Always On? WHO KNOWS? Not even Microsoft.
HIGH FIVES ALL ARO-
This is the error text:
The ALTER DATABASE command could not be sent to the remote server instance 'TCP://ORACLEDB.darling.com:5022'. The database mirroring configuration was not changed. Verify that the server is connected, and try again.
Alright, that’s silly. I used the GUI. Instead of going to bed I’ll spend some time checking all my VM network settings. BRB.
I’m back. They were all correct. I could ping and telnet and set up linked servers and RDP. What in the name of Shub-Niggurath is going on with this thing?
I can even see the Endpoint! So close, and yet so far~~
Where are we now?
This is a good time for a quick recap
- Mirroring is up and running synchronously
- The endpoint is configured on the witness
- We get an error when we try to connect the witness
TO THE ERROR LOG!
Well whaddya know? That’s a really good clue. Encryption and stuff. There’s no compatible algorithm. Ain’t that somethin’? You’d think that Microsoft would be cool about setting up the same kind of encryption across all the different Endpoints, if using different encryption would cause the setup to fail. Right guys? Heh. Right? Hey, hello?
Alright, let’s see what I need to be a matchmaker.
Since we have them both scripted out already, let’s just drop and re-create the Witness Endpoint with the right encryption algorithm.
That did not result in a forest fire. I’m hopeful. Sort of. It’s been a long night and I think I can see tomorrow from here.
Meanwhile, back on the Primary…
It worked! Now I have a Witness, and I can shut all my VMs down. That was so much fun.
What did we learn?
Microsoft hates you and doesn’t want you to sleep. Just kidding. Mostly. But seriously, why would they do that?
It mostly goes to show that it’s always a smart idea to use that little script button at the top of (most) GUIs in SSMS. Who knows what kind of foolishness you’ll find? A little reading can save you a lot of time troubleshooting errors that make you feel insane.
Thanks for reading!
During our very first training class, we showed the students how we use SET STATISTICS IO ON to get the number of logical reads performed on each table in a query, and then sum ’em up to see the query’s overall impact. It’s kind of a painful, manual process.
Sitting in the back row of the class (because that’s how he rolls), Richie Rump saw that process and thought to himself, “I bet I could make that easier.”
The bad news is that he probably didn’t learn much the rest of that day in class, because he immediately started building StatisticsParser.com.
The good news is that you got StatisticsParser, hahaha.
That kind of mentality is what we’re all about. We look for SQL Server tasks that are complex, and figure out how to make them easier for free. Whether it’s our blog posts, our community presentations, or our tools, we’re always looking for new ways to make your job suck less.
So let’s talk to Richie and figure out what makes him tick.
Brent: I think of each person as a toolbox. Over time, as we experience situations and solve problems, they become tools in our toolbox. I think back to my time in hotels, my work in accounting, and even my love of cars as various tools in my toolbox that I can use to get things done. What are some of the tools in your toolbox that make up who you are?
Richie: This surprises some but I spent almost five years as a project manager. I even earned the dreaded Project Management Professional (PMP) certification. I also started my career as a Access database programmer (way back in Access 2.0). Most of my career I spent using Microsoft tooling like Visual Basic, ASP, C#, and the .NET framework. I also spent a fair amount of time as a software architect. After my time as a project manager I pivoted my career towards the data side and absorbed all of the SQL Sever knowledge that I could. I spent almost twelve years in the supply-chain management vertical and have spent time in the accounting, payment processing, and legal areas as well. After reading all of that I feel old.
Brent: Today, you’re a serial (or maybe parallel) community activist – starting dotNet Miami, helping put on the South Florida Code Camp, co-founded the Away From the Keyboard podcast, built StatisticsParser and sp_DataProfile, give community presentations, etc. Take me back to the first time you said to yourself, “I should create something that gives back.”
Richie: That probably would be starting dotNet Miami. For years I followed the SQL Server community and the WordPress communities. I was always impressed by their generosity in not only their sharing of technical knowledge but in the way they care for each other as human beings. I looked around the developer community in Miami and wondered “Why can’t we have a .NET community like that in Miami?” So I grabbed a few people and met at a sports bar and we talked about starting a group. Six months later dotNet Miami was born. We’re still going strong and are committed to being a place where we can grow as technologists and as people. My favorite description of dotNet Miami is “We are not competitors, we are comrades.”
Brent: For years, you’ve been making our classes and webcasts more fun by poking fun at us. Now, you’re going to be on the other side. How do we help raise the next generation of the peanut gallery?
Richie: It takes commitment that’s for sure. I think the whole peanut gallery thing came from getting to know the team personally. Twitter conversations, in-person conference discussions, and training classes all lead to not me to poke fun at the Brent Ozar Unlimited team, but have a friendly conversation with friends. So you want to join the peanut gallery? Get to know us better over Twitter, chat with us at a conference, or join us for in-person training. (See what I did there?)
Brent: One last thing everybody’s gonna ask: where’s @Jorriss come from?
Richie: Ha! Back in the early days of the Internet we had these things called “handles”. For a while I was going by richier but that never sat well with me. So one night in college, a bunch of us geeky types were creating personas that would live in the Star Wars universe (don’t judge). There was a character in Timothy Zhan’s Heir to the Empire trilogy called Joruus C’baoth that I dug so I changed it a bit and out came Jorriss Orroz. From there the handle just stuck. Bonus points if you can figure out where the Orroz comes from.