You’ve been doing this database thing for a while, and you’re ready to get serious about it. What’s the next step?
Step 1: Define your specialty in one sentence.
If you say you do everything, you compete with everyone.
You want to be the only one they want. That means you’re:
- Actively sought-after
- Uniquely qualified
- A very high value for short bursts of time
- Respected for your opinion
- Worth more than your competitors (more on that later)
This sounds selfish, but remember – it’s not about you. It’s about your customers (whether they’re internal or external) and your ability to help them.
To pick your specialization, watch my webcast archive How to Get Senior In Your Title. I talk about the different types of DBAs and what they specialize in. Here’s one of the important slides from that session:
Most of you out in the crowd are going to say, “But I do all of these.” Sure you do – today. But we’re talking about where you want to be two years from now if you’re going to really stand out. Not only am I encouraging you to pick one of the columns, but I’m even encouraging you to focus on a specific horizontal row.
Examples of specialties include:
- “This server has to be reliable. We need AlwaysOn Availability Groups. I know just who to call.”
- “We need to manage thousands of servers with easier automation. I know the right person for the job.”
- “Our SQL Servers in VMware are just too slow, and nobody knows whose fault it is. I know who can tell.”
- “We need to offload our full text search, but we have no idea what to use. I know somebody who does.”
Notice that I’m phrasing these in a one-sentence pain point. You need to be known for resolving someone’s pain. This is the funny thing about business and consulting – you get paid the most to relieve urgent pain, not to provide keeping-the-lights-on maintenance.
The first step in your two-year plan is to write the one-sentence pain you want to resolve.
Step 2: Assess your current skills and your target skills.
Thinking about your one-sentence pain point:
- How many times have you relieved that pain?
- How many times have you failed to relieve it?
- When you hit an impasse, who did you escalate it to?
- Have you sketched out a process for diagnosing it? Has anyone?
- Have you documented the process for others to follow?
The more answers you have, and the more confident you are giving those answers aloud to someone else, the better your skills are. What, you expected a true/false multiple choice assessment test? Technology moves so fast that often the questions aren’t even right, let alone the answers.
Here’s a longer version of that assessment that I use for my own skills testing:
- I don’t know where the pain is coming from.
- I can identify the pain in clear terms.
- I know several possible root causes of the pain.
- I can identify exactly which one is at fault here.
- I know several ways to relieve that pain.
- I can identify exactly which one is right here.
- I’ve documented my triage process.
- I’ve hit situations where my process has been wrong, and I’ve learned from it.
From those levels, what level do you think you get paid for?
Surprise – it’s #1. You know plenty of people who are getting paid right now even though they have absolutely no idea where the pain is coming from. However, the higher your level, the easier it is to get paid more. (Don’t think that just because you’re on level 7, you’re making a bazillion dollars – there’s plenty of folks who aren’t great at negotiating their value, either.)
Figure out what level you’re at today, and get a rough idea of what level you want to be at in two years. Now let’s figure out how to get there.
Step 3: Build a 2-year learning plan to make that leap.
Divide the number of levels you want to jump by the amount of time you have. If you want to go up four levels, and you’ve got two years to do it, then you need to progress a level every 6 months.
This sounds really easy, but there’s a problem: you’re probably not repeatedly solving this pain point at your day job. You probably solve it every now and then, but not over and over in a way that helps you refine your technique.
That’s why a 2-year learning plan is really a 2-year sharing plan.
Nothing teaches you something like being forced to teach it to someone else. Heck, even building this blog post (and a presentation on it a few weeks ago) made me flesh out my own philosophies!
But to share, you have to get permission. Start by having this discussion with your manager:
Dear Manager – Recently, we ran into the problem of ____. To get relief, I did ____. Are you happy with that relief? If so, I’d like to talk about what I learned at my local SQL Server user group. I won’t mention our company name. Is that OK? Sincerely, Me
By having that discussion, you’re also making sure the manager is really satisfied with your pain relief efforts and that they saw value in your work. (After all, think of them as one of your first pain relief clients.)
Once you’ve got permission, here’s how you build the 2-year sharing plan: every level jump equals one user group presentation.
- Write the user group presentation agenda in 4-5 bullet points.
- Write a blog post about each bullet point. (The words in your blog post are what you’ll say out loud in your session – think about it as writing your script.)
- Build slides that help tell the story, but the slides are not the story. Don’t transcribe your blog posts word-for-word on the slide.
For example, if you need to hit the level “I know several ways to relieve that pain,” and your specialization is improving the performance of virtual SQL Servers, your user group session could be titled “5 Ways to Make Your Virtual SQL Server Faster.” You’d then write a blog post about each of the 5 ways. Presto, there’s your session resources.
At the end of your 2-year sharing plan, you’ve built up a solid repertoire of material, plus built up your own level of expertise. (You’ve also built up a little bit of a reputation – but more on that later.)
Step 4: Decide what lifestyle works best for you.
How much risk can you tolerate?
- Some. I could miss a couple of paychecks a year and manage my own benefits if I earned more.
- Lots. I’d be willing to go without income for a month or two per year if I could earn lots more.
- None. A very predictable salary and benefits are absolute requirements for me.
This determines whether you should be a full time employee, a long-term contractor that switches positions periodically, or a short-term consultant. In a nutshell, the differences are:
Consultants tell you what to do. They listen to your business problems, come up with solutions, and guide your staff on how to do it. They are typically short-term stints – a couple of days per month at a client, multiple clients at a time.
Contractors do what they’re told. They get a list of required solutions from the business and implement those solutions. They typically work together for long stints, showing up at the same client every day for months at a time, with only one live client relationship.
Full time employees do a mix of this. They come up with ideas, plus implement some of those ideas.
There’s no one answer that’s better for everyone. Heck, I’ve even changed my answer a few times over the last several years! It comes down to finding the right risk/reward balance for your own lifestyle needs, and then bringing the right customers in the door.
Step 5: Decide how you’ll market yourself.
Consultants sell advice, not manual labor, so they have many clients – which means doing a lot of sales.
Contractors sell labor, so they have fewer clients – which means less sales efforts.
Full time employees (FTEs) only have one sales push every few years when they change jobs.
Our company is a good example of the work required to do marketing and sales when you want to scale beyond one or two people:
- We have tens of thousands of regular blog readers
- Thousands of them attend the weekly webcasts
- Hundreds of them email us per month asking for help
- A few dozen turn into serious sales opportunities
- Around a dozen will book consulting engagements with us
This funnel approach demonstrates inbound marketing – using lots of free material to get the word out about your services and invite them to contact you for personal help. It’s a lot of hard work – very hard work.
The other approach is outbound marketing – cold calls to strangers asking if they’ve got your specialized pain point, and then trying to convince them that you’re the right person to bring pain relief. (You can kinda guess how I feel about outbound marketing.) Sure, it sounds slimy – but the takeaway is that it’s hard work, and every bit as hard as doing inbound marketing.
But only one of those options polishes your skills.
Inbound marketing is a rare two-for-one in life – it’s both your 2-year sharing plan, and your 2-year marketing plan. You don’t have much spare time, so you need every bit of it to count. Choose inbound marketing, do your learning and sharing in public, and you’ll write your own ticket.
Presto – You’re two years away from success.
No matter what pain you want to solve, how you want to solve it, or how you want to get paid for it, this simple plan will have you on the road to success. Now get started on writing down that one-sentence pain point!
You manage SQL Server databases, but you never get the chance to take time out of your busy day to test your backups. You assume that just because the jobs are succeeding, that you’ll be able to restore your databases when disaster strikes. Join Brent Ozar as he walks you through several queries of your MSDB backup history tables, checks your RPO and RTO live, and helps you build a recovery strategy for your production databases in this one-hour video.
For more videos like this:
The session list has been published, and we’re excited to say all of us have been selected to speak at the PASS Summit again this year. Here’s our sessions:
Are Your Indexes Hurting You or Helping You? – Jes Schultz Borland – Queries need your help! Your mission, should you choose to accept it, is to make great decisions about what indexes are best for your workload. In this session, we’ll review the difference between clustered and nonclustered indexes, show when to use included columns, understand what sargability means, and introduce statistics. You’ll leave this session with the ability to confidently determine why, or why not, SQL Server uses your indexes when executing queries.
Developers: Who Needs a DBA? – Brent Ozar – You store data in SQL Server, but you don’t have enough work to keep a full-time DBA busy. In just one session, you’ll learn the basics of performance troubleshooting, backup, index tuning, and security. Brent Ozar, recovering developer, will teach you the basic care and feeding of a Microsoft SQL Server 2005, 2008, 2012, or 2014 instance and give you scripts to keep you out of trouble.
Dynamic SQL: Build Fast, Flexible Queries – Jeremiah Peschka – Dynamic SQL is a misunderstood and much maligned part of a DBA’s tool kit – it can be used to solve difficult business problems, respond to diverse data needs, and alleviate performance problems. Many DBAs reject dynamic SQL outright as a potential source of SQL injections, being poorly performing, or just for being a hacky solution in general. Not so! Jeremiah Peschka has been making extensive use of dynamic SQL throughout his career to solve a variety of problems. In this session, we’ll be dispelling these misconceptions and demonstrating how dynamic SQL can become a part of every DBA’s tool kit.
From Minutes to Milliseconds: High-Performance SSRS Tuning – Doug Lane – Even though you’re an experienced report developer or administrator, performance tuning for SQL Server Reporting Services still feels as bewildering and hopeless as folding a fitted bed sheet. You’ve made your data sets smaller and timeouts longer, but it’s not enough to remove the slowness dragging down your reporting environment. In this session, you’ll learn how design and configuration choices put pressure on your report server and techniques to relieve that pressure. You’ll see how to configure your Reporting Services databases for speed, streamline your subscription schedules, and use caching for high-demand reports. You’ll also learn some design strategies to lighten your report processing load. If you want to maximize the speed of your Reporting Services environment and minimize the pain of performance tuning, this session is for you.
Lightning Talk: Conquer CXPACKET and Master MAXDOP – Brent Ozar – CXPACKET waits don’t mean you should set MAXDOP = 1. Microsoft Certified Master Brent Ozar will boil it all down and simplify CXPACKET to show you the real problem – and what you should do about it – in one quick 10-minute Lightning Talk.
Why Does SQL Server Keep Asking For This Index? – Kendra Little – SQL Server says you’d really benefit from an index, but you’d like to know why. Kendra Little will give you scripts to find which queries are asking for a specific missing index. You’ll learn to predict how a new index will change read patterns on the table, whether you need to add the exact index SQL Server is requesting, and how to measure performance improvements from your index changes. If you’re comfortable querying SQL Server’s missing index DMVs and looking at an execution plan here and there, this session is for you.
World’s Worst Performance Tuning Techniques – Kendra Little – Could one of your tricks for making queries faster be way off base? Kendra Little is a Microsoft Certified Master in SQL Server and a performance tuning consultant, which means she’s learned lots of lessons from her mistakes. In this session, you will learn how to stop obsessively updating or creating statistics, find alternatives to forcing an index, and deal with an addiction to ‘recompile’ hints.
Pre-Conference Session: psych! We didn’t get picked for an official pre-con this year (I know, right?) so we’re building our own lunar lander. Stay tuned – we’ll get things ironed out in the next week or two so you can make your official plans.
You’re in charge of making sure the database never goes down, but your managers won’t give you any budget. You’re tired of feeling guilty like it’s your fault there’s an outage – and now there’s help. Microsoft Certified Master Brent Ozar helps businesses understand that they have to pay the piper if they want high availability. He’ll share his simple two-page worksheet to get the business users on your side in this 30-minute webcast.
For more videos like this:
In your business application, you have a slow screen or function. Your team is pointing fingers between the application code, the database, and the hardware.
To make it go faster, here are your main options:
Check all of the options you’re willing to consider, and X out all of the options you’re NOT willing to consider.
The more boxes you X out, the more expensive the rest of the boxes become.
For example, on some projects with third party vendor software – say, SharePoint or Dynamics – managers give me requirements like this:
If those are my only options, I’m going to have to push those limits pretty hard, and I’m going to have to sink a lot of money into those options. I may have to step up to SQL Server Enterprise Edition and cache the entire database in RAM.
On the other hand, when I’m dealing with an in-house application with really agile developers, with the application hosted in Amazon Web Services, the grid looks more like this:
As you can probably guess, the tuning options are much more flexible here – not to mention cheaper.
So when you need to make your app go faster, tell your staff what options are on the table, and which ones are off.
You’ve got great ideas, don’t lie. We want to help you get those ideas out there… and into our scripts. It’s great to say that our tools (sp_AskBrent®, sp_Blitz®, sp_BlitzIndex®, and sp_BlitzCache™) are as good as they are today because of your help. We’re adding two new ways to make it easier to get a chance to share code and collaborate on some awesome tools.
Starting right now, you can head over to http://support.brentozar.com and submit new feature requests or vote on existing feature requests. You only get so many votes until we implement your feature, so make ‘em count!
This is also your chance to interact with us during the development of the feature.
A number of you have done this already! You’ve contributed a bunch of fixes, enhancements, and new checks for sp_Blitz®.
In the past you’ve had to email code contributions. We’ve made it easier – there’s now a form with a place for you to dump sample code. In fact, your sample code gets turned into a feature request in our version control system which, in turn, emails everyone. That really means we’ll get to your changes faster.
Ready to share your code? Head over to http://brentozar.com/contributing-code/ and get started.
Don’t freak out at the licensing agreement – the lawyers caught wind of what we were doing and said we should get you to sign it. All it says is that you’re the person who owns the code and you’re giving us the right to use your code; you still own the code.
What Are You Waiting For?
I sometimes hear database administrators say, “This database must be involved in our performance problem: it has too many connections.”
Lots of connections might cause a problem for your SQL Server, or it might not. The good news is that there’s a way to clearly tell if they’re dragging down performance or not.
Lots of connections can be confusing
If you use old built in tools like sp_who2 to see what’s happening in your SQL Server, many connections make it hard to see what’s going on. Firing up my test instance, I get back more than 45 rows from sp_who2 without any applications running at all. Sorting through that can be a mess.
For this reason, I’m not a fan of sp_who2. My usual question for the SQL Server isn’t “who’s connected?” I usually want to know other things:
- What’s making SQL Server wait right now? I like sp_AskBrent® to answer that.
- What’s running right now, and how long has it been running? I like sp_whoisactive by Adam Machanic to answer that.
Both of these tools help keep lots of connections from slowing YOU down as you look at your SQL Server.
How can I prove that lots of connections are causing a performance problem?
SQL Server can manage a lot of connections without too many issues. Yes, it’s better if an application cleans up its connections over time. We recommend tracking the User Connections performance counter, as patterns in this counter may coincide with other events and be useful information.
But just having a high amount of connections doesn’t prove that was the cause of performance problems. For a smoking gun as to why SQL Server is slow, look in two places:
- SQL Server Wait Statistics (look for THREADPOOL waits)
- The SQL Server Error Log (look for the creepy error below)
Threadpool Waits: a sign of trouble
When lots of connections want to run queries at the same time, SQL Server may need to allocate more worker threads. This process can be slow, and at a certain point SQL Server may reach the maximum worker threads allowed for your configuration.
“New queries have not been picked up by a worker thread”: Thread Starvation
When this gets really bad, you may see a nasty error message in your SQL Server Error Log. This message says something like this:
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the “max worker threads” configuration option to increase number of allowable threads, or optimize current running queries.
The message neglects to mention a few critical things:
- Microsoft doesn’t recommend that you change max worker threads, as a general rule
- The amount of worker threads needed has a lot to do with your parallelism settings (and changing parallelism settings is far more common than the amount of worker threads)
- Periodic problems like locking and blocking can also drive up worker threads. Raising worker threads might just mask the real root cause of a blocking problem!
What to do if you have THREADPOOL waits or Thread Starvation
First, don’t panic. Avoid changing any settings before you know exactly what they’ll do.
Here’s where to start:
How’s your parallelism? If you have the “max degree of parallelism” setting at the default value of 0 (or a very high number), a good first step is to learn about CXPACKET waits and options for configuring parallelism.
Do you have lock waits? If you have high lock waits (query), look into which tables have the most blocking and how to track down the queries involved.
Have you identified the most frequently run queries in the problem period? Good indexing or improved TSQL for those queries can dramatically reduce needed threads. (Handy tools here.)
What do the perf counters say? Take a look at those performance counters we recommend to see if there’s a jump in the number of connections when the problem occurs or not.
Approaching the problem this way dispels the mystery of whether all those connections really are the problem (or not).
You may have read that you need to have more than one data file in SQL Server’s tempdb. This can happen even if you’re using blazing fast storage. If you create a lot of tiny objects in tempdb you may hit a bottleneck on special pages that SQL Server uses internally to allocate all those objects. For certain workloads, adding more tempdb files speeds up SQL Server.
Most database administrators aren’t sure how to tell when they need to add more tempdb files. Here’s how you can find out for yourself.
Tempdb stress: my virtual machine setup
I stressed out my tempdb on a virtual machine with 6GB of memory, 4 virtual CPUs, Windows Server 2012 R2 and SQL Server 2014. The virtual machine is backed by solid state storage. My tempdb was configured with a single data file.
How I ran my stress test against tempdb
I used ostress.exe. This command line utility lets you easily run a query multiple times across many threads. You can get ostress.exe for free by downloading the RML utilities.
Here’s the ostress.exe command that I called:
ostress.exe -E -d"tempdb" -Q"exec dbo.tempdbstress" -n5 -r300 -b -q
The dbo.tempdbstress procedure looks like this:
use tempdb; GO CREATE PROCEDURE dbo.tempdbstress AS SET NOCOUNT ON; SELECT TOP(5000) a.name, replicate(a.status,4000) as col2 into #t1 FROM master..spt_values a CROSS JOIN master..spt_values b OPTION (MAXDOP 1); GO
Finding GAM and PFS contention in tempdb with sp_AskBrent® and Wait Stats
While this is running, look at waits in two ways. In one session, take a sample of waits with sp_AskBrent®:
exec sp_AskBrent @ExpertMode=1, @Seconds=10; GO
I see PAGELATCH_UP, PAGELATCH_SH and PAGELATCH_EX waits:
I also have a lot of just plain old CPU wait (the SOS_SCHEDULER_YIELD), because I’m bashing at only 4 virtual CPUs (and I’m running ostress.exe on the same VM as SQL Server).
Finding GAM and PFS contention in tempdb with sp_whoisactive
exec sp_whoisactive; GO
This lets you see the contention if you catch it at the right instant. Bonus: you can confirm exactly what type of pages its occuring on! I can see here that this is explicitly PFS waits. “PFS” means “page free space”: poor tempdb is hitting a hot spot on one of those special pages just recording how much space is available and where.
And here I can see the GAM waits. “GAM” means “Global Allocation Map”. That’s another one of those special pages that can get hot and slow things down in tempdb. It’s tracking where objects are allocated. (There’s also “SGAM” or “Shared Global Allocation Map” pages that you might see.)
Both of these types of waits can be alleviated if you add more data files to tempdb, because then you’ll get more PFS pages and GAM/SGAM pages across the files.
How to tell if you need more tempdb data files
First of all, if you just have one tempdb file, consider adding more files as preventive medicine. Don’t go crazy with this. The formula recommended by Microsoft in KB 2154845 to use one tempdb data file for each logical processor up to 8 processors is a good place to start.
Look at waits that have occurred since startup. Do you have PAGELATCH_UP or PAGELATCH_EX waits? (Note that these don’t have an “IO” in them. PAGEIOLATCH waits are different.)
If you do, that’s not necessarily 100% proof that the issue is in tempdb, but it’s a good indicator. If you don’t, well, it’s probably not a big issue.
If you do find these waits, dig deeper:
- Identify when the PAGELATCH_% waits are growing the most, and dig into those periods with sp_whoisactive. You can log sp_whoisactive results to a table
- Look at your top queries. The queries that are part of the issue might be suitable for caching in an application tier.
Tempdb file configuration reminders
Remember: this only applies to tempdb data files. You only need one tempdb log file.
Also remember that the tempdb data files will be used proportionally with their size. This means that you should keep the files equally sized. You can use trace flag 1117 to force all tempdb files to grow at the same time when one grows, but this will apply that behavior to every database on the instance (not just tempdb).
You can add tempdb files without restarting the SQL Server instance. However, we’ve seen everything from Anti-Virus attacking the new files to unexpected impacts from adding tempdb files. And if you need to shrink an existing file, it may not shrink gracefully when you run DBCC SHRINKFILE. For best results, deploy the configuration first to your pre-production environments, use a change request, and perform the change in a planned maintenance window.
Anti-virus is a devilish beast. If you don’t implement it, someone’s sure to log onto your SQL Server and download some free tool that puts you at risk as soon as you’re not looking. If you do implement it, sometimes it can make you crazy.
And often it’s not even up to you: you need to follow corporate policy.
Here’s what you should know if you’ve got an anti-virus tool on your SQL Server.
Set the right Anti-Virus folder, file, and process exclusions
Good news: you don’t have take my word on what types of files and directories your anti-virus software should exclude. Microsoft lays it all out for you in KB 309422
Special cases: Keep outta my Address space
If you use any of the following products, read KB 2033238, “Performance and consistency issues when certain modules are loaded into SQL Server address space”:
- McAfee VirusScan Enterprise
- McAfee Host Intrusion Prevention
- Sophos Antivirus
- PI OLEDB provider
Virtualization Guests need staggered schedules
Scheduling matters. VMware recommends that you set anti-virus scans to run at non-peak hours in your guests, and to set scheduling so that multiple guests don’t all fire up burn all their resources at the same time on a single hosts.
Need proof? See page 41 in the Performance Best Practices for VMware vSphere® 5.5 Guide.
Windows Failover Clusters
Check out KB 250355 for special steps for Failover Clusters. (Thanks, Gary, for pointing out this has a bit more info than KB 309422 mentioned above!)
If you’re Running Anti-Virus, Run it in Pre-Production, too
Anti-virus isn’t just for production servers. If you run into any problems, you want to be able to check for a repro outside of production, too, right?
Finding the worst queries in SQL Server is easy, right? You just query one of the
dm_exec_* DMVs, bring in the SQL text and the execution plan, and then you spend hours looking at execution plans trying to figure out what’s wrong in the query, much less how to fix it.
All that’s changing – we’re introducing sp_BlitzCache™.
Life Should Be Easy
We try to make things easy for you.
sp_BlitzIndex® make life easy – they find help you find configuration and indexing problems.
sp_AskBrent® helps you find problems that are hitting your SQL Server right now. We decided to take it one step further and apply our expertise to analyzing the plan cache.
There are a lot of queries that will help you get data out of the plan cache. A lot of those queries are also fairly complicated. They present a lot of data, but they don’t help you interpret the results.
Work Just Got Easier
sp_BlitzCache™ analyzes the top poorly performing queries for many different problems. After finding your bad habits, sp_BlitzCache™ builds a list for easy analysis. Your bad habits are outlined in a summary result set:
In addition, each of the queries in the top 50 worst performers will have their warnings called out in a separate warnings column:
This will not work on SQL Server 2005. If you want to analyze your plan cache on SQL Server 2005, you can use our legacy plan cache script.
There are a number of different options available with
sp_BlitzCache™. If you want a refresher, just run
EXEC sp_BlitzCache @get_help = 1. Otherwise, here’s a quick overview:
@get_help - Displays a summary of parameters and columns.
@top - Determines the number of records to retrieve and analyze from
@sort_order - CPU, reads, writes, etc. This controls collection and display.
@use_triggers_anyway - Don’t set this unless you’re crazy. SQL Server 2008R2 and earlier has a bug where trigger count can’t be trusted. It seemed safer to ignore triggers on those versions of SQL Server than give you bad information.
@results - There are three ways to display the detailed results grid. * Narrow will only display averages for CPU, reads, writes, and duration. * Simple, the default, displays totals and averages. * Expert displays all of the information collected by
sp_BlitzCache in addition to metrics normally displayed.
If you want to save the results to a table, you can use
@output_table_name to persist the results to a table. If the table doesn’t exist, it’ll be created for you.