If You Could Only Say One Thing About Writing Good Queries…

SQL Server
33 Comments

If you could give just one piece of advice about writing good queries, what would it be?

I asked Twitter yesterday because I wanted to make sure I didn’t miss anything in a new training course I’m working on, and the replies were fantastic. Here were some of my favorites:

What about y’all? Leave your most important tip in the comments, and read the Twitter thread for tons of gold nuggets.


Execution Plans Don’t Have the Yellow Bang They Really Need.

Execution Plans
12 Comments

When SQL Server is purely making an estimate up out of thin air, with no freakin’ idea what the real answer is, it doesn’t give you any kind of warning. It just produces an estimate that looks scientifically accurate – until you know how it’s actually getting calculated.

Let’s ask SQL Server how many users have cast more than ten million votes. You can use any Stack Overflow database for this:

To give SQL Server the best possible chance to calculate it, I’ll even create a couple of indexes, which will also create perfectly accurate fully scanned statistics on the relevant columns:

When I run the query, it doesn’t actually produce any results:

But if I look at the execution plan, the estimates are a little off:

SQL Server estimated that precisely 2,675,250 rows would come back – but none did.

So where does that 2,675,250 number come from? It would sound like it was the result of some really scientific calculations, but in reality, it’s just a hard-coded 30% of the number of rows in the table.

There’s absolutely no logic behind that estimate.

30% is a purely made-up number, and it’s hard-coded: you get the same answer if you use greater than OR less than, and with ANY NUMBER YOU COULD PICK. In my case, I picked ten million rows, but you can put any number in that query, and you still get a 30% estimate.

When SQL Server makes an estimate up out of thin air based on hard-coded rules like 30%, it needs to warn us with a yellow bang so that we can jump to fixing that specific problem.

The bigger your query gets,
the more important this becomes.

I started with a really simple example with a small blast radius: that estimation error doesn’t really harm anything. However, in real-world queries, you start to layer in joins and ordering, like finding all of the users and their badges:

You would hope that SQL Server would go find the users first, and then short-circuit out of the rest of the query if it didn’t find any matching users. You would have also hoped that we didn’t have a hard-coded estimate of 30% here too, though, and I’ve got bad news about your hopes. I’m here to turn your hopes to nopes, because here’s the live query plan:

SQL Server scans both tables at the same time because it assumes so many users will match – might as well start work on the Badges table, right? No sense in waiting around to short-circuit – we’re going to be working with millions of Users rows! Better get this party started.

After you learn about this, your next steps should probably be:


Announcing Fundamentals Week: May 10-14, $295

Company News
0

You love learning from me. You’ve attended some of my free online streams, read my blog posts, and you use the First Responder Kit. You enjoy my laid-back, humorous approach to sharing what I’ve learned over the years.

You love live classes. You’ve tried watching recordings or stepping through demos yourself, but…you just can’t block out the time, and you can’t stay motivated while you’re watching a recording. You like the fast-paced energy of seeing me live, asking me questions, and seeing me respond to your chats.

You hate conferences that feel like Zoom meetings. You’ve tried attending a few community events, but they felt just like sitting through one meeting after another. They didn’t dive deeply into any one subject – they just hopped around from one topic to another, and many of those topics just weren’t relevant to you. They were good for free stuff, but…

You’re ready for Fundamentals Week. On May 10-14, you’ll spend the week with me learning about indexes, query tuning, columnstore, how I use the First Responder Kit, and more. Here’s the lineup, all 9AM-4PM Eastern US time:

Your Fundamentals Week registration gets you into those live classes. Live Class Season Pass holders are always welcome, too. To keep prices low, recordings are not included – this event is really just for folks who like to learn live. If you want the recordings, check out my Recorded Class Season Pass.

Registration is open now. See you there!


Updated First Responder Kit and Consultant Toolkit for April 2021

This month’s release has two gems for SQL Server 2019 users. First, thanks to Greg Dodd, when you’re dealing with parameter sniffing issues, sp_BlitzWho can now show you the cached and live parameters for running queries. This lets you quickly see when the parameters may be wildly different, leading to issues like incorrect memory grants and index choices:

Second, thanks to Erik Darling: when sp_BlitzIndex shows a missing index recommendation, it also shows you one of the queries that triggered the missing index request:

Both of these require SQL Server 2019, and they’re both just awesome for performance tuners. There’s lots more new stuff this month too – keep scrollin’ for more details. Unlike Microsoft’s SQL Server Cumulative Updates, we actually take the time to link to each Github issue so you can see exactly what changed. We figure it’s the least we can do for you, dear reader.

When you find a bug or want something changed, read the contributing.md file.

Watch and learn

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_Blitz Changes

  • Fix: no more arithmetic overflow when we check drive space on a drive with 8MB free. (#2837, thanks ScottL1969.)
  • Fix: removed language that suggested SQL Server needed a domain account. I don’t want sp_Blitz to be used as a security audit tool, and our warning wording was misleading there. (#2854)
  • Fix: the USERSTORE_TOKENPERM cache size check had the wrong math. (#2858, thanks Erik Darling.)
  • Fix: volume names were getting truncated on the drive space alerts. (#2865, thanks Erik Darling.)

sp_BlitzCache Changes

sp_BlitzFirst Changes

  • Improvement: when @OutputXMLasNVARCHAR = 1, output is trimmed to 32K for easier copy/pasting into Excel. (#2823, thanks Ties Voskamp.)

sp_BlitzIndex Changes

  • Improvement: SQL Server 2019 users can see an example of the query that triggered a missing index request. (#2185, thanks Erik Darling.)
  • Improvement: new @ShowColumnstoreOnly parameter lets you visualize the columnstore contents for a table. This was always shown with table-level details (if you pass in a table’s name), but it was the last result set, and it can be pretty long for big tables. This way your results are easier to see on big monitors. I’ll be using this a lot in my Fundamentals of Columnstore class. (#2851, thanks DjHeath2000.)
  • Improvement: new @SortOrder options for ‘create date’ and ‘modify date’ for @Mode = 2, the inventory of your indexes. (#2845, thanks Erik Darling.)

sp_BlitzLock Changes

  • Fix: removed questionable documentation text. (#2849, thanks AdrianB1.)

sp_BlitzWho Changes

  • Improvement: new @ShowActualParameters (default 0) will show you the parameters that were used to compile and to run the execution plan. Only works on SQL Server 2019 & newer. (#2591, thanks Greg Dodd for heroically working around memory dump bugs in SQL Server to get this across the finish line.) These new columns also get logged to sp_BlitzWho’s output tables, and we’ll automatically add the columns if they don’t exist.
  • Fix: the above improvement broke on case-sensitive collations. (#2870, thanks Adrian Buckman.)

sp_DatabaseRestore Changes

  • Fix: no more arithmetic overflow when you try to restore a database with over 32,768 log backups. (#2839)

SqlServerVersions.sql Changes

  • Fix: corrected the build number for SQL Server 2017 CU23. (#2856, thanks Anthony Green.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


Who’s Hiring in the Database Community? April 2021 Edition

Who's Hiring

I’ve gotten a lot of emails from friends lately who are looking for work. It seems that as some companies go back to the office, some people are realizing they want to work remotely forever. And on the flip side, other folks are realizing they desperately want to get back into an office, but their companies have gone fully remote.

So let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

I close the comments here after a week, and publish another post each month in the Who’s Hiring category so y’all can get the latest opportunities.

If you want to be notified as people leave new job comments here, there’s a “subscribe without commenting” option where you leave a comment.


Free TempDB Training Class Next Monday

TempDB
1 Comment

You’ve been working with Microsoft SQL Server for a couple of years, and you know a little bit about TempDB. You’ve heard that temp tables and table variables have different performance characteristics than regular user database tables, and you’ve heard that you’re supposed to have multiple data files.

You’re wondering what exactly goes on behind the scenes.

This class is for curious folks who want to learn:

  • What uses TempDB, and how that affects performance: not just temp tables and table variables, but also triggers, cursors, sorting indexes, workspace spills, the version store, and more
  • How to host TempDB depending on how you use it: understanding what goes in the log file, the data files, why the number of data files matter, and whether your particular workload should be on SAN, local storage, or ephemeral storage, and why
  • How to monitor it: after you understand what uses it and how to host it, you need to keep tabs on how its performance changes over time using DMVs, Perfmon counters, and the First Responder Kit scripts

This course is 90% demos: the only slides are the introductions at the start of the day, illustrations to support a few topics, and the recap at the end of the day. The rest of the time, we’ll be working in SQL Server Management Studio. Roll up your sleeves and join me!

No registration required – just add this event to your calendar, then show up at BrentOzar.com/training/live/ on next Monday at 8AM UTC. This is a Europe-friendly time zone class – folks in other parts of the world will either need to get up at an oddball time, or else buy the recordings.

If you’d like to follow along with the demos during class – totally optional, not required – you’ll need a SQL Server 2016 or newer instance. Your server’s power doesn’t really matter – you can run this class’s labs on your local desktop just fine as long as you’ve got enough space for the database. I will show you the differences in SQL Server 2019, but if you’re not using 2019 yet, no worries – you can just watch those portions of the demos. You’ll also need any Stack Overflow database. In the class, I’ll be using the 50GB StackOverflow2013 version, but any version/size will work.

See you in class!


Azure SQL DB Frequently Asked Questions

Azure SQL DB
20 Comments
Brent Reading Book
I’ve seen your search history.

Let’s get you some answers.

Is Azure SQL DB PaaS, IaaS, or SaaS?

Azure SQL DB is platform-as-a-service (PaaS). Microsoft manages most common database tasks for you, and they act as your DBA.

This also means that like a DBA, Microsoft will tell you “no” if you try to implement something that they don’t support. There’s no negotiations: the answer is just no.

Who is Azure SQL DB for?

Azure SQL DB is a good fit for new applications.

It’s a great fit for software-as-a-service companies and independent software vendors (ISVs) who want to put each client in their own database. Azure SQL DB gives you performance and cost controls because you can tune how much resources each database gets, or put them into pools of resources that are shared across lots of databases.

SaaS and ISV companies don’t usually have full time database administrators, so they like that Microsoft handles the basic care and feeding of Azure SQL DB. There’s no dealing with patching, clustering, backups, and recovery – Microsoft manages all that for you.

Who is Azure SQL DB NOT for?

It’s not a good fit for companies with existing applications that do cross-database queries. If your app is scattered across multiple databases, and you’re used to doing joins across them, you’re going to be frustrated with Azure SQL DB. It’s technically possible, but it’s a giant pain in the rear, especially for established databases with hundreds or thousands of tables in each database.

It’s also not a good fit for companies with a lot of SQL Servers that already talk to each other doing things like linked servers, replication, and Agent jobs that fire off tasks on different servers at different times.

If you fall into the above categories, that’s where Azure SQL DB Managed Instances come in.

Compare Azure SQL DB versus Managed Instances.

Azure SQL DB is for one standalone database, versus Managed Instances which are more like one standalone SQL Server with multiple databases on it.

If you’re used to managing SQL Server, you’ll probably prefer Managed Instances: they have things like cross-database queries and Agent jobs. Databases fail over together, too. It’s basically like IaaS Availability Groups in the cloud, managed by Microsoft for you. Make no mistake, though: they’re still PaaS, and they’re priced accordingly.

Compare Azure SQL DB versus Synapse Analytics.

Azure SQL DB is mostly for transactional systems like web sites. Synapse Analytics is for data warehouses.

In a transactional system, queries deal with a few rows at a time, and they complain when queries take more than a second or two to complete.

In a data warehouse, people queries deal with millions of rows at a time, and their complaints tend to start when reports take more than 10-30 seconds to finish. They’re used to taking a longer time because of the size of data they’re dealing with.

Could you run a data warehouse in Azure SQL DB? Sure, but…it’s going to be expensive due to Azure SQL DB’s throughput limitations. It’s much easier (and more cost effective) to load large amounts of data into Synapse Analytics. Just don’t expect Synapse Analytics to handle a thousand queries at a time each with millisecond-level response times: it’s not designed for that.

Is there a free tier for Azure SQL DB?

No. The cheapest Azure SQL DB is the Basic tier at $5 per month, but it only gets 5 database throughput units (DTUs). This is effectively useless as even the simplest queries will take seconds to complete.

This is frustrating for shops that are used to using the free SQL Server Developer Edition for their development work, and hosting that development server centrally for multiple developers to work on at the same time. That’s okay: the better approach is to have every developer use their own local development database, like in a Docker container, and –

What’s that you say? Your development laptops aren’t large enough to host your full database size? Well, no, of course not, but you’re not supposed to be developing with production data. You’re supposed to be using an empty database for your proof-of-concept work, and then testing with a manufactured set of similar-to-production data.

What’s that? I’m afraid you’re going to have to speak up – your monitor has a really crappy microphone. Oh, I hear you – you’re asking about your QA, load test, and user acceptance test environments, which use databases restored from production. Well, uh, I don’t have good answers there: there’s no free Developer Edition for Azure SQL DB, and you can’t restore your Azure SQL DB databases down to a regular SQL Server. There are some kludgy hacks around exporting all your data, but those are performance-intensive and slow. Let’s just ignore this business problem and move on for now.

What about Azure SQL DB Edge?

The Azure SQL Edge page says it simplifies application development, but that’s not really what it’s for. It’s an innovative new way for Microsoft to extract more licensing fees by getting you to host databases in more places.

Wait, I don’t think I said that quite right. I mean when you have important data that lives in remote places, like in sensors or frequently-disconnected devices, Azure SQL Edge lets you have all of the power and complexity of a database server there, without the ease of management or cost-effectiveness of a centralized…

Hold on, I don’t think that was quite it either. Lemme get back to you on that one.

Why is my Azure SQL DB slow?

Most of the time when I work with clients, it’s because they criminally under-sized the server.

Azure SQL DB can be provisioned in very, very small hardware sizes for hobbyists or development purposes, and it’s very inexpensive. However, I find that folks provision it in those inexpensive tiers, and then they’re totally disappointed by the performance. You’re not hitting Azure SQL DB architectural limitations – you just provisioned it too small.

To find out if you’re hitting your account’s limitations, install and run sp_BlitzFirst from my free First Responder Kit. It’ll tell you if you’re hitting your Database Throughput Unit limits:

Azure SQL DB is maxing out

If you’re hitting your limitations on CPU, throughput, memory usage, etc, then you have two simple choices:

  • Turn up your size to a higher number, or
  • Tune your queries & indexes

If you’re spending less than $500 per month on Azure SQL DB, it probably just makes sense to turn the size up. If you’re spending more than that, it’s probably time to attend a performance training class or hire me for performance tuning help.

How can I learn about Azure SQL DB for free?

If you prefer to learn via live online events, Microsoft’s running a free Azure SQL Digital Event on May 4.

If you like self-learning – reading through material and running demos yourself – check out Microsoft’s SQLWorkshops Github repo. Scroll down through the readme, and there’s a list of courses. Each course is very well thought-out with tons of resources and demos to run.

If you prefer books, get Azure SQL Revealed by Microsoft’s Bob Ward. It’s fairly new, so it’s still relevant – which can be tough to find in the constantly changing churn of the cloud.


Your Top 10 Ideas for SQL Server

SQL Server
45 Comments

You’ve submitted thousands of bug reports and feature requests at feedback.azure.com, and here’s what you’ve voted as the top 10 requests right now:

#10: MIN/MAX as non-aggregate functions. PostgreSQL and Oracle have the GREATEST/LEAST functions that work like COALESCE, allowing you to pass in a list of values and get the top or lowest. Sounds nifty.

#9: SSMS keyboard shortcut to execute the current statement. Azure Data Studio has this and I love it so much.

#8: Better NoSQL functionality. Okay wait hold on, before you throw birds at the screen, they’re talking about how the world’s standards for XQuery, XPath, and XSDs have moved on since SQL Server first implemented those features. Alright, now you can throw birds.

#7: Support DISTINCT for STRING_AGG. Going by the notes in the request, it sounds like MySQL does this, and so when people come over from MySQL, they’re surprised at how much harder this task is in SQL Server. I honestly never noticed.

#6: Run CHECKDB automatically in the background. Despite the best efforts of the blogosphere, too many people just don’t check their databases for corruption. I get it – it’s not part of SQL Server’s defaults, it’s not a setup option, you have to learn about it to understand the risk, and Microsoft is never going to mount a user education campaign talking about how many corruption bugs SQL Server has. Instead, Microsoft just has to start fixing the problem by making databases more reliable.

#5: Add row position column to STRING_SPLIT. When you split a string with STRING_SPLIT, you get a table with the values, but they’re not guaranteed to be in order. Sometimes you really need to know the order when you’re processing a list.

#4: Add an SSRS ReportViewer for ASP.NET Core. It’s been marked as “under review” for over 3 years, but I have a pretty bad feeling about this. SQL Server Reporting Services strikes me as an NPC these days, with Power BI taking the new starring role. I wouldn’t expect SSRS to get a whole lot more development.

#3: Restore a table from backup. I really, really want this one because it’s the most common reason that we have to do restores: someone dropped a table or hosed up its contents.

#2: Put the Debugger back into SSMS 18. When Microsoft removed database diagrams and the debugger, I had a little fun with the blog post announcement, kicking those two features in the teeth. (I like using extreme announcement tactics on not-really-extreme things sometimes.) I’m completely ambivalent about those two features, though.

And your #1 top-voted request: Dark Mode for SSMS. And bad news, Microsoft has marked this as “Unplanned”, writing:

Unfortunately the complexity to implement dark mode into SSMS v18 is prohibitive and this item remains unplanned. Regular updates and improvements to SQL Server Management Studio are continuing, but most of the net new innovations in the graphical tooling space can be expected to be seen in Azure Data Studio.

You can kinda get dark mode with this workaround, and it’s a bummer that we’re so close but yet so far away.

For the longest time, I thought the dark mode request was dumb. Sure, I use dark mode on most of my apps, but I thought it was frivolous to ask for dark mode support in SSMS when that tool gets so little love. If Microsoft is going to allocate developer time to SSMS, I wanted real, meaningful improvements – not dark mode.

Then someone casually said to me, “You can tell that SQL Server and Azure SQL DB are legacy databases because SSMS doesn’t even have dark mode. Even our IBM mainframe development app has dark mode.”

Okay, that stung a little. It’s wrong…but it still stung.

I mean it is wrong, right?


Pop Quiz: What Do These Things Cost Today?

Hardware
17 Comments

Eyes up here, kidPrice check on Aisle 2021.

My main consulting job is a 2-day SQL Critical Care® where I help turn around slow SQL Servers. Companies come to me because they’re not sure if it’s a hardware problem, a database configuration issue, bad app code, or what, and they’ve been getting conflicting answers from different teams.

Before you spend money on consulting, though, stop to review what things cost.

Often, I find a whole team of people who have been struggling with slow performance – only to find they’ve got less hardware than my laptop, and it would only cost a few thousand bucks to make a really dramatic difference instantly.

So let’s do a price check. Without searching the web for prices, guess how much the below things cost – and you don’t even have to guess the exact price, just put ’em into buckets for perspective.

If you don’t see a form above, like if you’re reading the blog via RSS or email newsletter, you’ll need to visit my blog to take the quiz.

Got questions? Surprised? Leave a note in the comments.


Never, Ever, Ever Start T-SQL Comments with Two Dashes

T-SQL
117 Comments

There are two ways you can write comments in T-SQL:

Never, ever use two dashes. Never.

The problem is that monitoring tools and diagnostic management views often dump out the T-SQL in one long line. So this query:

Becomes this:

And you can’t tell where the query really ends. This REALLY causes problems with real-world-size queries with joins, CTEs, etc – stick a comment anywhere in the query, and you’ll break query-formatting tools like SQL Prompt.

Instead, frame out your comments:

That way, even if the monitoring tool or DMV dumps everything onto one line:

It’s still obvious what the query is doing, and T-SQL formatting tools can still rework it with proper line breaks.

I know some of y’all will want to lecture me about getting a better monitoring tool, but you wouldn’t believe what kinds of crappy tools are out there, and some DBAs just have to live with those tools. Frame your comments, and your successors will thank you. (I say “successors” not because they’re successful. I’m not saying they’re better than you. They just came after you, trailblazer.)


Free Columnstore Training Class on Tuesday

Fundamentals of Columnstore IndexesYour report queries are too slow.

You’ve tried throwing some hardware at it: your production SQL Server has 12 CPU cores or more, 128GB RAM, and SQL Server 2016 or newer. It’s still not enough to handle your growing data. It’s already up over 250GB, and they’re not letting you purge old data.

Will columnstore indexes help?

Next Tuesday, you’ll learn:

  • How columnstore data is stored, and how that impacts your architecture choices
  • How to do a hands-on self-assessment of your workloads, data, and server to figure out whether columnstore indexes make sense for you
  • Why partitioning makes so much sense for columnstore indexes
  • How to do a proof-of-concept implementation with initial creation, querying, ongoing loads, and index maintenance

No registration required – just add this event to your calendar, then show up at BrentOzar.com/training/live/ on next Tuesday at 8AM UTC. This is a Europe-friendly time zone class – folks in other parts of the world will either need to get up at an oddball time, or else buy the recordings.

If you’d like to follow along with the demos during class – totally optional, not required – here’s how to set up a lab VM. This requires a specialized full-size 300GB copy of the Stack Overflow database. This is not the kind of class where you wanna follow along on a low-end laptop: this requires the Mastering-level setup because columnstore is all about big data.

See you in class!


How My Attitude and Branding Changed Over 20 Years

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

It wouldn’t be a good navel-gazing series without a couple thousands words looking inward at myself.

When I first got started writing online, I was writing for myself. I blogged because it was enjoyable. I just liked writing. (I still do.)

10 years ago, I wanted to be 3 things.

Technical, approachable, likable

In the late 2000s, I started thinking more about branding. I joined Quest Software in 2008, and I spent a lot of time working with the marketing team. Smart, friendly people like Christian Hasker, Andy Grant, and Heather Eichmann got me to think more about how my blog looked to someone who was just walking in the front door for the first time, so to speak.

I went through a branding exercise to come up with 3 words to describe my online persona, and I came up with:

  • Technical – I wanted to write technical details about the real-world problems I ran into and how to solve them. This sounds obvious in retrospect, given the blog that you’re reading, but in the early years, my writing was all over the place, like buying pantyhose to build a turtle tank filter.
  • Approachable – I wanted readers to feel like they could talk to me about my work and their own database problems. I didn’t want to seem like I was locked in an ivory tower.
  • Likable – I wanted readers to laugh, enjoy their time with me, and come back for more.

I repeated that mantra over and over. I stopped writing stuff that wasn’t technical, I showed up on every social media platform, I got to know everybody I could in the SQL Server community, and I tried to write content everybody would love, universally. My SQL 2008 R2 Review post was a good example of what I aimed for during this period.

Over time, I gave up on approachable.

Thinking about smashing

I realized that being approachable meant that people would actually approach me. That sounds great in theory, but there were an endless stream of people who wanted me to do their work.

For free.

Right away.

I don’t remember the first time I got an email that said something like, “My client needs me to do ___, please walk me through it.” I wish I’d have had a video of me reading those words. Was I dumbfounded? Suspicious? I don’t remember the first time, but wow, did those emails come in like crazy over the years.

While I wanted to be both approachable and likable, I had zero interest in doing someone else’s job for free and being taken advantage of. At first, I kinda politely said, “That’s a great idea for a blog post. I’ll make a note of that and touch on it at some point in the future.” I pretty quickly abandoned that and just used a set of GMail canned responses.

I still wanted to be approachable, but…I didn’t want that to be one of my 3 core branding words. If that was one of the first things people thought about when they thought of me, they would bring me every tiny problem for free as if I was their virtual coworker, and I simply couldn’t afford to do that. I wouldn’t have any family time left.

Over time, I gave up on likable too.

My smile hasn’t changed

Let’s be honest here, dear reader, because you know me pretty well: we both know I’m sarcastic.

It was easy to keep the sarcasm toned down in blog posts because I’m a huge believer in scheduling posts ahead of time. If I click Schedule instead of Post, then I have more time to think over the contents. I’ll come back to a post a day or a week later, contemplate the contents, and then take out the sharpest humor.

As I started spending more time on social media, especially Twitter, the sarcasm shone right through. I would see a tweet, respond sarcastically immediately, and then regret it a day later. (Or not.) I can sit here and say, “That person deserved a sharp reply so they could see the error of their ways,” but we’re talking about me here, and the problem is that I’m sarcastic.

Microsoft would bring out a feature that didn’t make any sense, and I’d tweet about it sarcastically. The Professional Association for SQL Server would shoot themselves in the foot again, and I’d take to social media with some witty banter. The problem was that I was gradually offending a lot of people in my industry: they might forget about their bad decision, but they wouldn’t forget the way I’d tweeted publicly about it.

A big part of the problem: my metrics. I viewed the blog as the big megaphone (because I’ve got over 100K email subscribers and tons of web hits), and Twitter as the quieter water cooler chat, ephemeral stuff. However, someone took me aside one day and said, “You realize you’ve got more than twice the Twitter followers of anybody else in our industry – it’s not even close – and when you say something kinda casually on Twitter, tens of thousands of people can hear it?” That sounds like a humblebrag, but it’s not: I’m humbleashamed. Twitter was a big megaphone too, and as I looked back, I felt really bad about the way I’d used it over the years because it worked against the likable goal. (Those last few words are important, though.)

That moment caused me to rethink the “likable” part of my branding.

I could have pivoted and said, “I’m gonna turn this ship around, and be genuinely likable again online.” But the reality is that the damage was done: nobody in the industry was ever gonna see me as Mother Theresa, and let’s be honest: I’m not Mother Theresa. I can’t market something I’m not. (I mean, I could. But that goes against my core values.) Lemme zoom way out for a second.

Your life is like a blank white canvas.

Whether we’re talking about your career, your job skills, a relationship, your online reputation, whatever, the same concept applies: you start with an empty canvas. Over time, every action you take fills that canvas in, painting a picture.

From time to time, you need to step back and survey your work. Are you painting the right picture? Have you been obsessing over a tiny detail that doesn’t really matter? Do you need to take some major corrective actions? Is there anything you need to paint over? Do you need to change your approach to the rest of the canvas given what you’ve painted so far?

Let’s take job skills, for example. I got started in hotels, then used my hotel experience as a developer & sysadmin for hotel companies. I used that experience to become a database administrator, and filled in more space with work in virtualization, storage, cloud, startups, etc. As long as I kept building on top of that same picture, I could make it better and better over time.

You don’t really get another canvas.

If you want to start painting a different picture, you can take the space you have left and start there – but you only have so much time and mental space. Your new work is going to be heavily influenced by the work you’ve already done on that canvas. If you hit reset on your skills and try a totally different industry, you might paint a smaller, better picture for yourself – for however you define “better” – but it’s going to be smaller in scope. (My Dad did that, starting over as a nurse after decades in business, and I’ve always admired the work and dedication that it required.)

Similarly, branding is like that: if you’ve built up a large online following that sees you a certain way, and you wanna change it, you’ve already got a partially-filled-in canvas. Plus, you gotta keep in mind that you painted this stuff: it’s your style. You might just want to accept yourself for who you are and run with it.

A Dita Von Teese quote helped me decide.

Around the same time I was having the epiphany around the “likable” part of my branding, I saw this:

That quote right there changed my life.

It rang so true for me because I thought back about my experience with sp_Blitz. When I first launched it, there were a few people who griped, “It’s free as in beer, but not free as in speech – it’s not really open source.” When I open sourced it with the MIT license, they griped, “He’s still collecting emails to download it.” When I put it on Github, they griped, “Well, it still has his name in it.” I wanted them to like me, but it was never good enough, and I had to stop beating myself up. Some people just aren’t going to like me and what I do, full stop, and Dita’s quote taught me that that’s okay.

I thought about the people that I admired at the time, and for every one of ’em, there were plenty of haters. But who cares? *I* liked them, and so did hundreds of thousands (or millions) of other people. They made my life more enjoyable and they taught me things. There’s no such thing as a media personality that everyone likes. (I used to think Oprah was an exception – everybody likes Oprah, right? – but the aftermath of the Harry and Meghan interview taught me that even Oprah has her haters.)

In 2016, I came up with 3 new attributes.

When you’re writing code or managing a project, you need to know where you’re going. Blogging, writing presentations, and live streaming is no different: you need to understand the end result you want to achieve, and have a meaningful path to get there. So in 2016, I revisited my branding words to come up with a new strategy for myself, and I came up with:

  • Training class poll, March 2021

    Technical – no changes here. I do share lifestyle & business stuff on my Instagram, my TikTok, and Ozar.me, but those are just hobbiesfor my own fulfillment, kinda like how blogging was for me back in the early 2000s.

  • Pain-relieving – I really work hard to focus my training and consulting on the most urgent, most relevant things that will make your SQL Server pains go away. I’m rarely about learning just for the sake of learning: the things you spend time learning and doing need to immediately pay off in a happier SQL Server.
  • To the point – your time is valuable, and so is mine. We’re both going to cut through the BS as quickly as possible in order to get what we both need to succeed. I want you to be able to stop working with me as quickly as possible so that you can go back to shipping value for your customers or your own career. The faster I can get your server fixed or get the right knowledge into your brain, the better.

(And, uh, this blog post is none of those three. But look, I’m allowed to go off-topic here now and then. This is the 20th anniversary of BrentOzar.com, and I warned you that this series would be navel-gazing.)

The last one is the biggest change over the last 10 years.

Eyes up here, kidScrew civility: if I think something is a waste of your time, I’m going to be up front and tell you right away in no uncertain terms.

I’ll use an easy, obvious example: Microsoft’s certifications have been a waste of your time and money for years. They’ve changed the certification paths, branding, and names so many times that there’s no market awareness and no value to your career. The only time you should invest your precious time in those is if your company is picking up the tab, they allow you to study during work hours, and they require the cert. Those certs are just garbage, and they’re not going to become the next hot commodity. You’re not suddenly going to be an in-demand professional just because you crammed and got some irrelevant cert.

I’m not saying you’re not allowed to waste time. I love wasting time – I spend hours on TikTok, hahaha. We just need to be clear on what’s a waste of time, and what’s marketed to you as something productive and valuable for your career – when it’s really just a waste of time. (I’m picking on Microsoft certifications here, but the same holds true for lots of things that I pan.)

Yes, this means people who, say, sell Microsoft certifications aren’t going to like me. It took me years of self-examination to realize that I really shouldn’t care what those people say anyway: I don’t respect them at all, period, full stop. I actively disrespect them because they’re wasting your time and money, and that really pisses me off, dear reader.

When I say dear reader, I mean the core set of readers around here who are struggling to succeed while maintainining a work/life balance. You, the kind of people who stick around to the end of a blog post like this, the ones who have been reading this blog for years, who regularly find me in your Google search results and click on the links because you know I’ll get you an answer quickly, the ones who leverage the First Responder Kit to do your jobs better and faster.

If I think someone on a live stream is wasting YOUR time disrespectfully, I cut straight to the chase and tell them. I talked about my sarcasm on social media – I think it comes through much more strongly during my Twitch & YouTube live streams. For example, if someone comes to a free class and they didn’t follow the instructions in the prerequisites, they’re going to get a quick, to-the-point instruction that they didn’t do their homework, and we’re moving on. I’m not here for them – I’m here for you, dear reader.

Like peaches and Dita Von Teese, I’m not for everybody.

And I’m okay with that.


Sale Closes Wednesday: Save 75% on My Training Classes, and the EU/UK Can Buy Too.

Company News
6 Comments

Good news, party people: we’re finally selling to the EU/EEA/UK again!

To celebrate, during March you can use coupon code Happy20th to save 75% off my Live Class Season Pass, Recorded Class Season Pass Fundamentals, or Masters recordings, or any of my online classes.

You can also buy my classes now in Euros, GBP, or US dollars, whichever currency is more convenient for you. When you click the Buy button, you get your choice of currency, and it calculates your VAT and helps you with exemptions as well.

I’m also offering live online classes in Europe-friendly times: 8:00-16:00 UTC. (UTC since I’m living in Iceland at the moment.) Here’s the upcoming class schedule:

When you buy a Live Class Season Pass, you get access to all of my live online classes during your subscription. During each year, I’ll teach each of the classes at least 5 times:

  • 2 in US-friendly times, 9AM-4PM Eastern
  • 2 in EU-friendly times, 8:00-16:00 UTC
  • 1 on weekends (Fri/Sat/Sun) in US times

I schedule ’em about 3-4 months in advance, so another US-friendly rotation for the Mastering classes will start around August.

See you in class!


Free Live Webcast: The First SQL Server Webcast You Should Ever Attend.

When you pass in a query, how does SQL Server build the results?

It sounds really simple, but we all take so much for granted.

SQL Server stores most data in 8KB pages – kinda like spreadsheet pages. For example, here’s a simulated 8KB page from the dbo.Users table in the Stack Overflow public database:

dbo.Users clustered index

In a free live How to Think Like the SQL Server Engine webcast on April 7th, I’ll walk you through how SQL Server uses these spreadsheet pages to deliver your query results.

I will give you a scenario: I’m the end user sending in queries, and the SQL Server engine is my client. Using simple spreadsheets as tables, you’ll learn how SQL Server builds execution plans, uses indexes, performs joins, and considers statistics.

This one-hour session is for DBAs and developers who are comfortable writing queries, but not so comfortable when it comes to explaining nonclustered indexes, lookups, sargability, and fill factor.

To prep for the session, download and print out this 3-page PDF, and then register for the session here. See you there!


The Way Developers and Users Interact is Broken.

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

Twenty years ago, I tried to switch to Linux.

Again.

That wasn’t my first attempt, nor was it my last – there were so many more – and every time, I had a terrible experience. I struggled to get the right combination of hardware, software, and configuration settings. But you know what the worst part was? Trying to get support for all of it.

I saw myself as a decently skilled sysadmin and developer with a good amount of experience and troubleshooting skills, but I had one hell of a bad time getting support. Things people said would work, didn’t, and nobody seemed terribly interested in helping me get it right. Documentation, forums, email lists – it was a byzantine mess, nobody was accountable for getting things right, and I felt firsthand the impact of gatekeepers and condescending answers. The more I tried to make it work, the more burned out I got.

In 2006, I gave up and got a Mac.

I gave up the idealistic hope of a complete open source stack, configuration via simple text files, and the feeling of being a wizard. I wasn’t an operating system wizard. I just wanted something to boot up, connect to WiFi and a VPN, install applications, and sleep & resume, and when something broke, I wanted to get support from a company that would be personally accountable for getting it working again. I was fine with handing over money and permissions in order to get simplicity.

Today, I maintain the open source
First Responder Kit.

It has SQL Server scripts like sp_Blitz that help you assess your SQL Server’s health and performance. They’re the same scripts I use in my own consulting work, and are used by data professionals around the world. Back when I used to track downloads, I was tickled pink to see that even Microsoft’s own support staff downloaded them.

Today, as an open source maintainer, I see the other side of the story.

Many users don’t read the documentation. They try to use the scripts in completely inappropriate and unsupported ways. They complain when the scripts don’t do what they want. They see all this as my own responsibility, saying things like:

Eyes up here, kid
Your call is important to us, please hold.
  • “You should make it be able to do ____”
  • “I’m very disappointed that it doesn’t ____”
  • “It needs to ____ or else it’s worthless”
  • “Please do the needful”

When I point out the readme and the documentation, which clearly states that if you want a new feature, you need to start coding for it – it almost always falls on deaf ears. I’m endlessly thankful to the folks who actually do contribute code, documentation, and support, because they’re so few and far between compared to those who see open source as free consulting & development.

This is hilarious timing, but as I was writing this, my watch vibrated with a direct message in the SQL Server community Slack from someone who wants help. I responded asking them to try in the #FirstResponderKit channel as the readme says so that other folks can help too. (I checked back an hour later – they never did.)

I totally understand that a lot of the First Responder Kit users who need help are just as qualified as I was when I was trying to get Linux help. I wasn’t an amateur, and neither are these users. But it’s just an endless flood: there are always way, way more inexperienced users than there are experienced users. There just aren’t enough experienced users with free time to help, and many of the people who see themselves as experienced…aren’t.

It happens with Microsoft, too:
try Azure Data Studio.

It’s one of Microsoft’s newest open source products for database developers, a SQL Server Management Studio replacement for people who write T-SQL rather than manage servers. When you run into a problem with Azure Data Studio, you’re expected to file it in Azure Data Studio’s Github repo.

There are over 1,000 open bugs, like serious bugs that I hit all the time, many of which have been open for years.

When you browse that list, it’s hard to determine if the software is buggy, if the users need better support, or if it’s a combination of both. Pro tip: it’s always a combination of both.

There are almost 1,000 open requests for enhancement, and many aren’t trivial requests: they’re monsters, like adding support for entirely new database platforms. Users feel free to ask for a pony because there’s no cost to them for making the request. Any bozo can ask for a pony by writing up a Github issue, and they can gather thousands of upvotes by posting it on social media or forums.

I don’t blame Microsoft for how this is going down. This is what happens when you build complex cross-platform software that runs on multiple operating systems and targets multiple databases. It’s the same story with Visual Studio Code’s issues, where there are also over 1,000 open bugs going back to 2016.

These aren’t open source problems.
They’re software problems overall.

It’s a problem with $1 phone apps, where users resort to leaving bad reviews and developers try to do tech support by responding to the review.

It’s a problem with Microsoft SQL Server, one of the most expensive database platforms in the world. You would think that when you hit a SQL Server problem and you think it might be a known bug issue, that it’d be easy to go to a web page, search for the symptoms, and see a known list of bugs and the status of whether they’ve been fixed or not.

For example, say you’re hitting issues with CMEMTHREAD waits. You go to support.microsoft.com, put CMEMTHREAD in the search box, and you get this wall of text:

There’s no sorting and filtering for database versions. You’re left to click on every single post and figure out whether it’s relevant to your patch level. There’s no commenting or questions, either. At the bottom of each post, there’s a “Need more help?” search box – which just takes you back to this search result list.

And these are only the published, fixed issues.

Not the known broken issues. Those are hidden away in secret for only Microsoft employees to access.

If you’ve been working with SQL Server for a while, you might know about feedback.azure.com, a place where you can post known bugs with SQL Server in the vain hopes that Microsoft will do something about it. In rare cases, they do – but in most cases, the requests go unaddressed. And just like its predecessor, Microsoft Connect, this feedback site is going away in 2021.

Closed source software only seems like it doesn’t suffer from the extensive bugs problem because the bug list is kept hidden by the manufacturer. We users thought the software was reliable, but if you talk to the really experienced pros in the industry, they’re often furious by just how broken common features are.

The way developers and users interact
is completely broken.

I’m hopeful that someone’s going to solve these problems during my lifetime.

Stack Overflow might be one way to solve it, but they’ve got a lot more work ahead of them. Right now, the newest incoming questions feed is an insane firehose, and the DBA.StackExchange.com new questions, while lower in volume, is still too challenging for the volunteers to solve alone. Maybe Stack could solve it by doing a better job of walking users through crafting good questions, and then recruiting (and paying) vendor employees to answer questions about their products.

Microsoft’s acquisition of Github might be another way to solve it because there’s no way they’re happy with the way this is going down, either. Microsoft could tune the new-issue process at Github and start handling paid support requests through Github issues. I’m sure I wouldn’t be the only open source maintainer who would gladly charge $10 for folks to open a support case just to cause ’em to think more before asking for a pony, and offer a higher-paid issue tier with a higher service level agreement.

Or maybe it’ll be an all-new Software-as-a-Service provider, but…I don’t think that’s going to be the case, given how Microsoft experimented with UserVoice and then pulled back to their own in-house products.

I don’t know what the right answer is, but the way we’re doing it now is completely busted. I should be thankful, though, because the fact that SQL Server support is so completely broken happens to enable a delightful career in fixing that very product’s shortcomings.


Updated First Responder Kit and Consultant Toolkit for March 2021

Brad McGehee speaking to the Red Gate PASS party, 2011

Ten years ago, I unveiled sp_Blitz at the 2011 PASS Summit, in front of a live audience.

Remember the PASS Summit?

Remember live audiences?

<sigh>

Anyway, to celebrate the 20th anniversary of BrentOzar.com, I’ve thinking back about the last couple of decades to reminisce about what I’ve learned, what’s changed, and so forth.

Open source kinda sorta existed when I got started with computers as a kid way back in the 1980s. I would buy a programming magazine, prop it up in front of the computer, and actually type programs in by hand on my Commodore 64. Lemme just tell you, that sucked. I’ve been bad at debugging for four decades.

Today, anybody in the world can install open source utilities in a matter of seconds, make their job easier, increase their value as a professional, and give back to those same open source projects. It’s not fair to say that your Github profile is the new resume – most folks just don’t have the time to contribute to open source, nor are they allowed to at your day job. However, I think it’s fair to say that your Github profile is one of many possible advantages when competing for a job. When someone asks, “Do you have experience with X?” it’s amazing to be able to answer, “Yes, and I’ve even contributed code and documentation to X. Here’s my Github profile, and here are some of my interactions with the team.”

So with that said, here’s this month’s new version of the First Responder Kit. It’s free – free as in speech – and it thrives thanks to the community members mentioned in the changes below. They’re real people just like you with day jobs, and they decided that they wanted to contribute or fix something. They’re the heroes that make this whole thing work.

When you find a bug or want something changed, read the contributing.md file.

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_Blitz Changes

  • Improvement: we ignore databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Erik Darling.)
  • Fix: made all of the scripts consistently refer to https://www.brentozar.com rather than variations without https, the www domain, etc. (#2808, thanks John D. Ballentine III.)
  • Fix: removed the checks for active & inactive heaps in user databases. This is covered really well by sp_BlitzIndex now, and it’s slow in sp_Blitz. (#2818, thanks Erik Darling.)

sp_BlitzAnalysis Changes

  • This is the second release that includes sp_BlitzAnalysis by Adrian Buckman. If you want to install this script, install sp_BlitzAnalysis.sql – it’s not included in the normal installation scripts yet just to limit the blast radius in case something goes wrong. For tips on how to use it, check out the documentation. For questions or problems, see the “For Support” section of this doc.
  • Fix: expanded one of the columns from the money datatype up to decimal(38,2) to handle big data. (#2807, thanks Ian Menton.)

sp_BlitzCache Changes

  • Improvement: when @IgnoreSystemDBs = 1, we ignore queries with a context of databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Iomckee.)
  • Improvement: @OutputType parameter now accepts ‘NONE’. (#2817, thanks Jefferson Elias.)

sp_BlitzFirst Changes

As we mention in the documentation, Azure SQL DB isn’t technically supported by any of the First Responder Kit scripts because Azure SQL DB doesn’t have all of the cool diagnostics stuff that we get on-premises, like sp_MSforeachdb. However, at some point in the future, we’ll officially support a lot of them up there, and this month’s changes inch us towards that.

  • Improvement: in Azure SQL DB, we now sys.dm_db_wait_stats instead of sys.dm_os_wait_stats. (#2821)
  • Improvement: pass @OutputType parameter to sp_BlitzCache so if you specify ‘NONE’, there really is no output. (#2817, thanks Jefferson Elias.)
  • Fix: insert of changed statistics would sometimes fail. (#2738, thanks Jefferson Elias.)
  • Fix: Azure SQL DB was erroring out due to addition of sp_MSforeachdb in the recently changed statistics check. (#2819)

sp_BlitzIndex Changes

  • Improvement: we ignore databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Erik Darling.)

sp_BlitzLock Changes

sp_BlitzWho Changes

  • Fix: in the deltas view, expanded one of the columns from the money datatype up to decimal(38,2) to handle big data. (#2813, thanks Ian Menton.)

sp_DatabaseRestore Changes

  • Improvement: you can now move the full text catalogs with the new @MoveFullTextCatalogDrive parameter. (#2805, thanks Frederik Vanderhaegen.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


Our Monday Links Email is 10 Years Old.

Company News
5 Comments

I’ve been celebrating the 20th anniversary of BrentOzar.com, but it just so happens that this year marks another fun anniversary.

Ten years ago, way back in 2011, I wrote:

You’re overworked.  You don’t have the time to sit around reading blogs, keeping up on the latest industry news, and reading web comics.

That’s where we come in.  To stay on top of our game, we have to spend a couple days per week honing our skills.  We’ve started a weekly email recapping the best stuff we’ve found in the community this week.

You can read the very first one we sent, complete with our original Brent Ozar PLF branding featuring me, Jeremiah, Kendra, and Tim:

The look has changed a lot since then, but the idea is still the same: share the best links from a week of web surfing. There’s always database stuff, but also generally interesting stuff, too.

When we decided to launch a newsletter, we did not want it to be about our own content. If you want to read the stuff we’ve written, that’s what the blog is for. It didn’t make sense to craft new content and then keep it hidden behind a newsletter subscription.

Instead, we wanted to share the best stuff that other folks have written – things that you might not otherwise find. Ten years ago, the blog was already really popular in the database industry, so we knew that if we launched a newsletter, we’d get a lot of subscribers quickly. (It worked: we’ve had over 100K subscribers for quite a while.) We wanted to help other bloggers in our industry by bringing them thousands of fresh readers.

I know firsthand how hard it is to start a blog. You toil away writing a really helpful post – only to have it disappear into the ether, apparently with no readers and no comments at all. Our newsletter tries to fix that by spotlighting the good stuff – no matter whether it’s an established blogger, or someone who’s brand spankin’ new. My favorite moments are when new bloggers say excitedly, “holy smokes, you brought a lot of readers to my blog!” I look at the newsletter as a service to both the readers and to bloggers in our industry.

And it’s worked that exact same way every Monday for a decade.

If you’re not already a subscriber, you can subscribe for free here, and you can see the most recent issue here. Enjoy!


Where I See Databases Going in the Next 10 Years

Architecture, SQL Server
33 Comments

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

I’m going to generalize a little by decade here to keep things simple.

In 1990-2000, databases were mostly tied to specific development languages. When you developed an application with a Microsoft Access back end, you were very likely using Microsoft Access to store the data, too. If you wrote a web page in Microsoft FrontPage, you were likely using Access or Microsoft SQL Server as the back end.

1990s Brent

During 1990-2000, one of our biggest database administration problems was server sprawl. Every time we had a new app, we just spun up a new with a new database. Most of those servers had fairly low load in terms of performance, so as licensing costs went up, we went through consolidation projects to merge multiple databases onto less servers to cut costs. (This continued in the 2001-2010 time frame, especially with the advent of server virtualization.)

In 2001-2010, object-relational-mapping tools (ORMs) tried to abstract the database away. In theory, developers worked with objects, and the ORM layer mapped things to the physical database storage. In theory, some ORMS were cross-platform, saying that you could switch your database back end without rewriting your code. In practice, that didn’t work very well. In the 2000s, our app code language choice still pretty much dictated the most popular database we would use for that particular language. C#? SQL Server. Java? Oracle.

During 2001-2010, Microsoft SQL Server rode the wave of .NET popularity, being the easy choice for C# developers. Microsoft’s strategy to grow SQL Server appeared to be making SQL Server a good fit for other purposes, too. It was hard to grow revenue by convincing PHP developers to switch from MySQL to Microsoft SQL Server – the cost difference just doesn’t make sense.

However, it was much easier to grow revenue by convincing ETL developers to switch from Informatica to SQL Server Integration Services. SSIS was free in the box with SQL Server, so if you were using Informatica to push data into SQL Server…why waste your licensing money on Informatica? Same thing with SQL Server Analysis Services, SQL Server Reporting Services, Filestream/Filetable to use SQL Server as a file server, XML in the database, CLR code to use SQL Server as an application server, and so on.

In 2011-2020, 3 things changed how we pick databases.

In no particular order, all of them being independent:

  1. Developers got tired of SQL’s rigid requirements and started building open source NoSQL persistence layers (MongoDB and Redis are great examples) to solve specific challenges at a very low cost.
  2. Cloud providers made it really easy to spin up a new production-quality persistence layer, like one you might not have otherwise had the skill or spare hardware to spin up, and only pay for what you use (in theory.)
  3. Hosted software-as-a-Service (SaaS) apps meant that often business folks didn’t even have to pick a persistence layer at all. Software vendors that used to have to support their apps on-premises on a variety of databases now just took that decision (and the support) away altogether. Instead of having to pick a database their enterprise customers would be comfortable with, they could use whatever they wanted.

As these 3 things continue to spread – and I use that term in the current tense, because they’re still happening – Microsoft SQL Server is less and less the obvious de-facto choice for C# developers. If you’re building a C# app in the cloud, it might make more sense to use:

Early 2010s Brent
  • DynamoDB if you just need key/value storage, not a full blown relational database
  • Redis if you just need a cache
  • Azure SQL DB if you just need a little code in Azure Functions, and you need relations
  • PostgreSQL if you want relational at a low cost without being tied into a cloud vendor
  • Just a plain ol’ file share if you want to save XML or JSON data for later analysis with a variety of tools
  • And so on.

Plus, those 3 things above lowered the barrier to using multiple persistence layers, each for their own specialized purpose. There were so many persistence layers available, at such a seemingly low cost, without having to worry about on-premises support for each of ’em – that developers became more emboldened to use multiple persistence layers even inside a single application.

The architect in me loves this because we get to choose the perfect tool for each task at hand.

The business owner in me is horrified because we’re heading for something completely untenable.

2021-2030: Welcome to Sprawl 2.0.

Data lives everywhere, in redundant copies, none of which speak to each other, and all of which cost us by the byte, by the month.

Performance sucks everywhere, and every query costs us by the byte of data read.

None of these persistence layers (SQL Server included) have good cost management built in for size or performance. Most databases are still in their infancy of telling you which queries are burning the most resources, let alone telling you which queries are costing you the most money.

I got a sneak peek when I was talking to the data warehouse manager at one of my clients. He showed me their architecture diagram for their next-generation data warehouse: Azure Data Factory pulling files from sources, loading them into various databases, dumping flat files into a data lake, queried from Spark and Microsoft Big Data Clusters. He explained which parts were pay-as-you-go services, which parts were VMs, and which parts were in Kubernetes.

Then he said, “Yesterday, my costs quadrupled. The only reason we even found out so quickly is because the accounting team had an alert set up. My team has been working on it for four hours, and they’ve narrowed it down to query workloads (not storage or ADF) – but we’re realizing we don’t even have the skills to answer the question of what queries cost us the most money yesterday, or if they’re new queries, or new users, or changed database structures, or what.”

“We’re just letting people run queries and light money on fire.”

New database servers could solve this, but won’t.

When we solved Sprawl 1.0, we were at least only dealing with a handful of database platforms. We could gather all the SQL Server applications together onto less servers.

With Sprawl 2.0, we have:

  • Countless database platforms hosting…
  • Countless apps built with different languages and connection libraries
  • Living in a mix of different cloud vendors and on-premises

In theory, vendors could:

  • Build a single database that masquerades as multiple databases
  • Let you connect lots of existing apps down to just one platform
  • Build good cost management, making it compelling to executives
  • Maybe even build in cost reduction with automated query tuning, making it even more compelling to execs

Azure Cosmos DB is the forerunner here, I think: one database back end, but lots of APIs so that it could act as any number of different NoSQL or SQL servers. In theory – we’re talking 2021-2030 here – Microsoft could gradually enhance it so that you could take applications written for different back ends, and then consolidate them so everything lives in Cosmos DB.

But at the same time, Azure Cosmos DB is yet another new server that we have to manage, calling to mind the classic XKCD cartoon:

Similarly, but way less ambitious, Amazon’s working on Babelfish for Aurora PostgreSQL. The idea is that you’ll put both your Postgres and SQL Server databases in the same place, and connect your applications to it. It’ll service T-SQL queries out of PostgreSQL.

Look, I said it was less ambitious – I didn’t say it wasn’t ambitious.

Babelfish has a hell of an uphill battle because just SQL Server alone has an incredible amount of features, data types, and edge case gotchas. Switching your SQL Server application over to Babelfish on Aurora is going to call to mind the ORM lessons of 2001-2010. Existing applications are complex enough that in most cases, it’s just too hard to forklift their data out of one database, move it over to another, and expect everything to work. With both Cosmos DB and Babelfish, there will be case studies funded by vendors that will gloss over the complexity, but…these migrations will be few and far between.

(Just because someone in the audience will ask – no, Azure Arc doesn’t solve this problem of database sprawl. If anything, it enables sprawl because it encourages self-provisioning of databases & servers & services, both in the cloud and on-premises.)

At the same time that vendors are trying to build one-size-fits-all databases, they’re going to be in a race against sprawl. More new specialized databases will pop up. Existing databases will get new capabilities. The centralized, sprawl-reducing databases won’t be able to keep up.

Sprawl 2.0 is the new normal.

As a database administrator, your job isn’t just to manage the configuration, health, and performance of the database servers you have today.

In 2021-2030, your job is to:

  • Watch the options for persistence layers out there, reading developer news sites to see what new options are popping up, where they’re succeeding, and where they’re falling short
  • Help your team evaluate new persistence layers by discussing their advantages and disadvantages compared to the persistence layers you’re already using
  • Encourage the use of existing persistence layers rather than picking new ones – because every new one that comes in, you’re gonna have to learn to manage it, and you’re not exactly sitting around bored, and the company’s not hiring help for you, either, because the sales pitch on all these new persistence layers is that they’re all self-managing – and you know how that goes

You don’t have to develop a level of expertise on all of the options out there – it’s just too hard, and the new players change too quickly. For example, it’s a waste of your time to try to master a new platform like Cosmos DB if your company isn’t actually using it yet. However, you wanna learn about where it’s a good fit and what its weaknesses are – so that when your developers start talking about using it, you can understand its strengths and weaknesses relative to the ones you’ve already got.

As a blogger, I hate telling you to do something without giving you references to actually do it. In this example, I should be a good blogger and say, “Now go here to learn about Cosmos DB’s strengths and weaknesses.” The problem is that I just don’t have a good list of resources for every persistence layer out there, and I don’t have the time to go build it, either. I can’t even recommend vendor documentation sites because they’re so biased in favor of the product.

I’d leave you with one piece of advice: when you want to learn more about a database, Google for:

  • (database name) release notes – and then read about the known weaknesses or bugs.
  • (database name) limitations – and look for features that are important to you.
  • (database name) site:news.ycombinator.com – because Hacker News surfaces a lot of interesting pro and con stories. You have to take these with a grain of salt – but the same thing is true for vendor documentation, too.
  • If it’s available, read the database’s Jepsen analysis to learn about edge case failures
  • And finally, if it’s open source, find its Github repo, and read the list of issues. Sort them by most commented, and see what problems the community has run into. Pay particular attention to how long those issues have been open, and what kind of progress the maintainers are making.

Because it’s really easy to start building a database or an online service.

It’s much harder to do it well.

And if you thought it was hard to manage just one, wait til you see how hard it is to manage a whole bunch of half-ass persistence layers.


It’s Been Decades. Why Don’t Databases Manage Themselves Yet?

SQL Server
26 Comments

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

A much younger me

When I started working with databases, we had to:

  • Provision the right CPUs and memory
  • Provision the right storage throughput and size
  • Install the database
  • Configure the database
  • Protect it – which sounds easy, but that also includes designing the right high availability and disaster recovery solution
  • Design the right tables, and put indexes on them based on how we wanted to query the data
  • Write fast, accurate queries to load & retrieve data
  • Troubleshoot the whole thing when it went wrong

Today, decades later…yeah.

We’re still doing all of that stuff. It’s just that we’re cave-people using rocks for tools instead of banging on things by hand.

Vendors aren’t focused on these problems.

Oh sure, they’re marketing that they’ve built something better, but when you start to rub your finger on the product, the wet paint wipes off and you see the conventional problems underneath.

Let’s take something really simple: let’s build a new Azure SQL DB:

Configuring a new database in Azure SQL DB

 

You get sliders for vCores and data max size.

But notice the number with the biggest font size on the screen: 150. Your choices for vCores and data size determine log space allocated, and it’s important enough that Microsoft puts it in bold, but you can’t actually control it directly. If you want more log space, you have to increase data space.

So if it’s important enough to put in bold, we must know the right number for it, right?

Nope. In Azure SQL DB, space consumption is something that’s controlled only by the database server, not by administrators. You can’t fix it by backing up more frequently: Microsoft controls your backups. Microsoft is hinting that, “Yo buddy, you better know how large your transactions are and how far behind your AG secondaries get, and if you get this wrong, inserts/updates/deletes will fail.” They’re supposed to have fixed this with Accelerated Database Recovery, which the documentation says is on in Azure SQL DB and can’t even be turned off – but it’s still an issue, or it wouldn’t be the biggest number font on the screen. Or even worse, if it is fixed, why is it being featured so prominently, suggesting to users that they need to build a bigger server to handle a log file size that doesn’t matter anymore?

<sigh>

It’s 2021, and we’re back to 2001.

We’re still screwing around with micromanaging transaction log sizes, trying not to run out of disk space. On premises, we learned the easy fix long ago: just set log file sizes reasonably, and then leave autogrow on. If one database out of fifty happens to have a runaway transaction, they can grow temporarily. We’ll deal with it.

Scaling CPU? Self-re-configuring? Nope.

Forget about storage – what about sudden demands for CPU or memory? Serverless, self-scaling databases like AWS Aurora Serverless and Azure SQL DB Serverless say they solved it with auto-scaling. That’s not solving the problem – it’s just unlimited provisioning. You wake up to a $100,000 cloud bill when some bot scrapes your site or when an app ships with an N+1 bug.

I know first-hand because our own SQL ConstantCare® product is built atop Aurora. Last year, I blogged about how a single database view cost me $50/day for months, and you know what? I’m in that same exact boat again today. Our database costs gradually creeped up because we deployed more and more queries for new features, and some of the new queries weren’t well-tuned. I had an item in my backlog for weeks to go do more database tuning, and our Aurora costs crept up to over $200/day. I had to go bust open AWS Performance Insights, find the queries involved, tune them, check them into Github, have Richie fix the unit tests, and then deploy it.

Self-configuring databases? Not even close: Microsoft SQL Server 2019 and its cloud equivalents, Azure SQL DB, still ship with the same Cost Threshold for Parallelism setting that every freakin’ setup checklist still tells you to fix manually. Every release, it seems like we get thrown a bone for one or two configuration settings just so the marketing team can say the database is becoming more self-configuring. That’s great, but at this rate, it’s going to be 2070 before setup checklists can go away – if ever, because vendors keep adding features and dials that need extensive tweaking.

Self-patching? Well, kinda, but only in the worst way. In Azure, you don’t get a truthful changelog, and you don’t get to test before they do it to your production server, or when. Just this month, in March of 2021, Azure SQL DB finally announced that you can pick what window they’ll use for maintenance to take your server down, and get alerted before they’re gonna do it. You can choose between:

A kinda-younger me friends, Chicago 2012
  • Every day, 5PM-8AM local time, or
  • Mon-Thurs 10PM-6AM
  • Fri-Sun 10PM-6AM

That’s it. Those are your only choices. And that feature is only in preview, and only in select regions. If vendors really cared about database administrators, this stuff never would have gone into production like this.

Somebody in the audience is going to say, “Does Azure Arc help?” Well, it purports to, in the sense that it puts both your on-premises SQL Servers and Azure instances into the same Azure control panel, gives you self-provisioning, and has policy-based management. Given Microsoft’s track record with Policy-Based Management and the Utility Control Point, both of which were grandly-announced features that quietly died off without getting fixes or improvements, I’m gonna wait to see if this is another one-and-done turkey.

Vendors are focused on selling,
and database administrators don’t buy databases.

As important as you think you are, you’ve likely never written a check for database licensing or cloud services.

The reason you have a job is that your employer implemented a database, it grew over time, and it grew beyond the self-managing capabilities touted by the brochure. You were hired long after the database choice was made. And no, you’re not really able to influence future purchasing decisions because that ship has sailed: your company picked a persistence layer, and when it comes time to pick additional persistence layers for new products, you’re only going to be one of many people sitting at the table.

Databases are bought by two roles: developers and executives.

When developers make database decisions, in most cases, they’re not experienced enough with multiple database platforms. They’re either familiar with one, and that’s the one they pick, or they’re not familiar with any, and they’re forced to make a selection based on the vendors’ marketing material, what their friends say, what they read on HN, or some meaningless database rankings site.

When executives make database decisions, they either work from feature lists like EnterpriseReady.io, looking for things like auditing, role-based access, Active Directory integration, or else…they work off the vendor’s brochure, and self-managing is just another feature in a feature list.

Developers and executives buy features.

I don’t mean this in a negative way! It’s the same way we buy appliances or cars or laptops. We have a rough idea of the features we need. Products rarely have all of the features we want – at least at the price point we’re willing to pay – and so we have to make compromises. We might accept a half-baked version of one feature because we want another feature so badly.

In 2021, every database’s brochure has “self-managing” slapped on the brochure somewhere, and every vendor will show just enough demos to hoodwink non-DBAs. “Self-managing” doesn’t have to be fully true in order to sell – just like “secure” or “performant” or “scalable.” Everybody uses these keywords. They don’t mean anything.

Even worse, self-managing is just one of the features people are looking for. Every time a database vendor slaps on another feature, that feature has dials and switches to configure it – and when the feature is brand new, those dials and switches have to be configured by hand. There’s never a best practices doc – the feature has to ship first before the real world learns the best practices, and we all write documentation to tell each other how to set the dials and switches.

Want proof? Check out the home page of Couchbase.com. Look at the kinds of features that they tout. They emphasize the power of the product, empowering you to do all kinds of things – but you still have to actually do those things, like size itmonitor & tune the queries, configure the security, and configure all the settings. It simply isn’t self-managing – and it doesn’t matter, because people love it. They’re going public with a $3 billion valuation. The market has spoken, and features are where the money is.

It’s a race between new features
and self-managing those features.

Sure, vendors can say that a brand-new feature is self-managing – but again, rub your finger on it, read the release notes, run an edge case performance test, and you’ll pretty quickly find the undocumented dials and switches that need to be turned just so in order to make this pig fly.

New features will always win the race.

They have to. Vendors are in their own race against other vendors, racing to ship features that nobody else has, or to keep up with features that someone else has already shipped.

This influenced my own career:
I gave up on production DBA work.

Ten years ago, when Microsoft first brought out Always On Availability Groups, I was ecstatic. I was doing a lot of production database administration work at the time – making sure SQL Servers were highly available and well-protected – and I saw Availability Groups as a better way to achieve that goal.

But then the features versus self-managing race started.

The good news was that Microsoft poured resources into AGs, adding all kinds of features to them. More replicas. Cross-database transaction support. Availability Groups that spanned multiple Windows clusters.

The bad news was that not only were these features not self-managing, they were practically unmanageable: the documentation was terrible, the GUI was non-existent, the damn thing broke constantly in crazy unexpected ways. (For fun details, read the past blog posts on TarynPivots.com, Stack Overflow’s DBA.)

By the time Microsoft said they were bringing Availability Groups to Linux, I said okay, I give, no more, time out. I had come to see SQL Server high availability as a Sisyphean task: I could never really be done with that miserable work, and even when I thought I was done, some new surprise knocked me flat out. Even worse, those surprises always came after hours, on weekends, on holidays, when all I really wanted to do was relax.

If you do want to learn Availability Groups, Edwin Sarmiento is about to open a new rotation of his class. He only does a few of these per year, and it’s a hybrid of consulting, coaching, and training. Learn more here.

I focused on development DBA work instead.

Here’s a quick rundown of the difference between developer, development DBA, and production DBA:

I chose to move from Production DBA to Development DBA. These days, my consulting work and my training classes focus on solving performance emergencies. I can at least schedule those engagements in advance because companies know when their performance is slowly getting worse, and we can work together during weekdays, business hours, in order to turn things around. (As opposed to production outages, which always hit at the worst possible times when I’ve just made a fresh gin & tonic.)

Production DBA work is still there, mind you, and every month I get emails from folks asking me to record new training classes on how to do it. I’m just flat out not interested in that work, and I’m not interested in doing training on it anymore.

So with that in mind, I put our Fundamentals of Database Administration class and Senior DBA Class on YouTube, for free, forever. They’re older classes, and they don’t have the best audio & video quality, but I’d rather just let them out there to the public to help the folks that need ’em. Hope that helps – and I also hope that database vendors start taking the self-managing thing more seriously, for the sake of production DBAs that still have to deal with this mess.

In my next post, I’ll talk about what I think the next 10 years of databases will bring, and how that affects our job duties and career prospects.


What Does a Database Administrator Actually Do?

BrentOzar.com is 20 years old, and I’m celebrating by being a little retrospective and future-looking this month. Today, I’m publishing a post that I could have published 10 years ago, or maybe 20 years ago, and maybe even 10 years from now. How much of this would have changed in the past? How much will change in the future? Think about that as you read through this. Here we go…

A DBA makes sure the databases are backed up, well-configured, patched, secure, and fast.

I explain my job to acquaintances by saying, “You know how some web sites are really slow, and their checkout times out, or it loses your data? It’s my responsibility to make sure those problems don’t happen. I’m not the only person involved – there are also developers and engineers – but I’m responsible for the part that holds your data, like your profile, your orders, your browsing history, and all the data that the evil companies have been gathering about you. I’m not sure I’m supposed to say that last part out loud.”

The job duties of a database administrator include:

  • Ensure all database servers are backed up in a way that meets the business’s Recovery Point Objectives (RPO)
  • Test backups to ensure we can meet the business’ Recovery Time Objectives (RTO)
  • Troubleshoot database service outages as they occur, including after-hours and weekends
  • Configure monitoring utilities to minimize false alarms
  • As new systems are brought in-house, choose whether to use clustering, log shipping, mirroring, Windows Azure, or other technologies
  • Install and configure new database servers
  • Patch existing database servers
  • Deploy database change scripts provided by third party vendors
  • When performance issues arise, determine the most effective way to increase performance including hardware purchases, server configuration changes, or index/query changes
  • Document the company’s database environment
  • Help the company lower their hosting costs by performance tuning, consolidating, and re-sizing

Why is a database administrator important?

When the database goes down, the business stops. If you’re selling things online, and your web site goes down, that’s an emergency. Even if it’s not down – if it just slows down – that can be an emergency too. DBAs help prevent that problem.

When someone accidentally deletes data, the business can stop. Unfortunately, in all too many companies, there are too many folks with access to that data – and those folks are often untrained. They’re just human, and they make mistakes. Database administrators help recover from that problem.

Where do database administrators work?

Since DBAs are involved with protecting important data, you’ll find them everywhere data is important – and these days, that’s everywhere! However, most companies with just 1-2 databases don’t have enough work to keep a full time employee busy. In these kinds of shops, you’ll find a person who spends some of their time doing database administration. We call those folks “accidental DBAs” – they didn’t mean to become DBAs, but they just inherited some of the duties.

The larger a company is, the more DBAs they’ll have. It’s common for big national companies to have entire teams of DBAs.

What’s a typical database administrator salary?

According to our annual Data Professional Salary Survey, DBAs in the United States make an average of $112,880. To slice and dice those numbers or dig into other countries, download the raw data in an Excel spreadsheet.

As with any other profession, folks make more money when they have more experience or live in higher cost-of-living areas.

Can remote database administrators work from home?

They can – and let’s face it, right now everybody’s working from home anyway – but most companies prefer to have their DBAs show up in the office. There’s a level of paranoia around the DBA position because DBAs protect one of the company’s most precious assets: their data. DBAs get pulled into a lot of ad-hoc meetings due to emergencies, and due to the level of severity of those emergencies, managers wanna see faces and lock people in rooms.

Some companies are willing to hire purely remote DBAs, but that’s pretty few and far between. When they do, it’s usually because:

  • The company is in a hard-to-reach location – for example, one of my clients is in a small town, and they simply haven’t been able to find DBAs in their local market, nor have DBAs been willing to relocate to this town. The company had to settle for a couple of remote DBAs instead.
  • The company wants very specific hard-to-find skills – for example, one of my clients needs serious production experience with Distributed Availability Groups, a rarely-used feature, and there just aren’t that many DBAs in the world with those skills. The folks who have it can dictate their job requirements, like working from home.

But these are exceptions rather than the rule.

What’s a database administrator versus a data analyst?

A data analyst writes queries to get answers from the data, whereas a database administrator makes sure that the answers come back quickly.

A data analyst spends a lot of their time writing new queries from scratch using SQL, reporting tools, or data science tools. They usually work directly with business users, and they know more about what the business does. They know what the columns in the tables mean, and they know how the different tables relate to each other. They also spend time interpreting those results: writing reports, talking to the business users, and clarifying requirements.

A database administrator usually has less business-specific knowledge. They know more about systems that hold the data, like Microsoft SQL Server, Postgres, MySQL, Oracle, etc. A DBA will know which users run which queries the most, and they may even help the data analysts craft better queries, but they’re not usually writing the new queries from scratch.

This gets a little confusing because some businesses just slap the term DBA on everything. “If they’re working in the database, they’re database administrators,” hahaha.

What’s the difference between a junior DBA and a senior DBA?

A junior DBA is doing almost all of their tasks for the first time. They rely on Google for pretty much every task they need to accomplish.

A senior DBA has spent a couple/few years working as a junior DBA. They’re used to doing the common tasks like configuring backups, troubleshooting outages, managing security, and responding to end user requests. That’s not to say they don’t still do tasks for the first time – they often do, and they rely on Google to help them figure things out. But at least half the time, they’re doing things they’ve already done before, and rather than Googling the concepts, they’re Googling the most efficient way to do it at scale, like automating it with PowerShell.

The number of years of experience has nothing to do with whether someone’s a junior or a senior. I’ve seen a lot of people who have 10 years of experience, but they’re still a junior – because they’re still doing exactly the same things they did 10 years ago, the same way they did it back then. They haven’t done any learning on how to automate the more common tasks, or how to accomplish more challenging ones.

What tools do database administrators use?

When administering a single SQL Server, they usually use SQL Server Management Studio. It’s a free tool from Microsoft that’s over a decade old, and was written from the start as the main console for the database server. It’s fairly uncommon to see DBAs using third party tools to manage servers just because SSMS has been out for so long and covers so much functionality. I spent most of my time in here.

When administering lots of SQL Servers, DBAs use PowerShell, usually with the DBAtools.io framework. You can get started learning that with the book Learn PowerShell Scripting in a Month of Lunches.

For writing new code, Microsoft wants us to use Azure Data Studio. It’s like Visual Studio Code, a cross-platform app targeted at writing new stuff from scratch. It can do some administration tasks, but it’s really targeted at developers. I spend a little of my time in here because I write queries for both Microsoft SQL Server and Amazon RDS Aurora Postgres, and Azure Data Studio works with both of those.

What training prepares you to get the first DBA job?

Companies almost never hire DBAs without on-the-job experience. To get that experience, there are two common career paths.

Get a job as a developer first, and spend a lot of time querying SQL Server. As your job progresses, you’ll eventually design new SQL Server objects like tables, indexes, and stored procedures. Then you can gradually become “the database person” in the office, and if it makes sense for your career, gradually transition into a dedicated database administration role.

Or, get a job as a sysadmin first, and as part of your overall server administration duties, you’ll come in contact with database servers. As your job progresses, you’ll install new database servers, patch existing ones, troubleshoot them when they break. You can then gradually become “the database server person” in the office, and if it makes sense for your career, transition into becoming a DBA.

In both cases, you’ll have a hole in your knowledge. Developers may not know the installation & configuration side. Sysadmins may not know the querying side. That’s okay – you can learn the remainder of it when you become a full time DBA.

If you’d like a jump start, I have free YouTube courses on the Fundamentals of Database Administration, and a free Senior DBA Class.

Can freshers get a job as a database administrator?

If you’re fresh out of college, you’re unlikely to go directly into a DBA job. You’re better off getting a developer or sysadmin job as I described above, then working your way into a DBA position.

When you do find companies hiring freshers as remote DBAs, they’re usually training the employee for basic DBA job duties. Be careful here: these jobs are extremely low-paying, and they’re probably not going to get you a great DBA job after that. You’re going to have to leave the company to get a better job, but when you take the “expertise” that you learned from this company and try to use it to get the next DBA job, you’re likely to be disappointed. The company was teaching you a bare minimum of things just to bill for your time, and it’s not really seen as real-world DBA experience.

Are database administration certifications enough to get a job?

In the Microsoft SQL Server world, no: Microsoft’s certification programs like the Azure Database Administrator Associate just don’t accurately reflect what database administrators do. They reflect what Microsoft wants DBAs to learn – because Microsoft wants people to evangelize Microsoft cloud services inside their company.

Sadly, Microsoft certifications just aren’t a good path to get your first DBA job. Instead, focus on getting a job as a developer or sysadmin first, then become “the database person” in your shop, and go from there.

Can you be a database administrator without a college degree?

Absolutely – I dropped out of college myself. However, you’ve gotta be willing to put in the years of work doing development and systems administration first before you get a DBA job.

I wouldn’t use myself as an example of “the right way” to get a DBA job. It’s possible – it just takes a long, long time. I would also say that I’m a DBA because I dropped out of development! I didn’t have the patience to continuously learn new languages and frameworks, and debug in all kinds of different browsers. Database administration was easier for me, so I stuck with that.

If you don’t have a degree, take heart: just get work as a developer or systems administrator instead. It’s going to be easier (not easy, just easier than jumping straight into database administration) and continue following the things you love to do, and avoid the things you hate.

Will DBAs still have a job in the cloud?

I get this question a lot as companies start to consider cloud services like Azure SQL DB.

For the answer, read back through that list of job duties at the top of the post. Several of ’em are indeed covered by self-managing cloud database platforms, but many are not. (For example, if you want to keep your backups longer than 35 days, that’s currently only in preview, and in only some Azure data centers.) In addition, performance pricing starts to become a real hassle in the cloud: as your workloads grow, so do your costs. DBAs can now pay for themselves by helping performance tune workloads to reduce hosting costs.

Is database administrator a good career?

If you’re willing to put in the years of work that it takes in order to gradually move into a DBA position, it’s fabulous. You’re the center of a company’s data, which means you know what’s going on everywhere. You know all of the IT staff, you know what projects are happening, and you can help empower everyone to get their job done.

However, it comes with drawbacks:

  • It’s hard to get your first DBA job
  • It’s hard to graduate from junior to senior DBA without changing companies
  • Most DBA positions are on-call, and the on-call work can be exhausting
  • Some DBA positions have tons of responsibilities, but no authority to change things for the better: you’re stuck maintaining a broken system

It’s been a solid career for the last couple of decades, and things look pretty rosy for the 2020s and 2030s. In my next post, I’ll talk about why databases aren’t self-managing yet, and how that affected my last 10 years.


Menu