Pop Quiz: What Do These Things Cost Today?

Hardware
17 Comments

Eyes up here, kidPrice check on Aisle 2021.

My main consulting job is a 2-day SQL Critical Care® where I help turn around slow SQL Servers. Companies come to me because they’re not sure if it’s a hardware problem, a database configuration issue, bad app code, or what, and they’ve been getting conflicting answers from different teams.

Before you spend money on consulting, though, stop to review what things cost.

Often, I find a whole team of people who have been struggling with slow performance – only to find they’ve got less hardware than my laptop, and it would only cost a few thousand bucks to make a really dramatic difference instantly.

So let’s do a price check. Without searching the web for prices, guess how much the below things cost – and you don’t even have to guess the exact price, just put ’em into buckets for perspective.

If you don’t see a form above, like if you’re reading the blog via RSS or email newsletter, you’ll need to visit my blog to take the quiz.

Got questions? Surprised? Leave a note in the comments.


Never, Ever, Ever Start T-SQL Comments with Two Dashes

T-SQL
117 Comments

There are two ways you can write comments in T-SQL:

Never, ever use two dashes. Never.

The problem is that monitoring tools and diagnostic management views often dump out the T-SQL in one long line. So this query:

Becomes this:

And you can’t tell where the query really ends. This REALLY causes problems with real-world-size queries with joins, CTEs, etc – stick a comment anywhere in the query, and you’ll break query-formatting tools like SQL Prompt.

Instead, frame out your comments:

That way, even if the monitoring tool or DMV dumps everything onto one line:

It’s still obvious what the query is doing, and T-SQL formatting tools can still rework it with proper line breaks.

I know some of y’all will want to lecture me about getting a better monitoring tool, but you wouldn’t believe what kinds of crappy tools are out there, and some DBAs just have to live with those tools. Frame your comments, and your successors will thank you. (I say “successors” not because they’re successful. I’m not saying they’re better than you. They just came after you, trailblazer.)


Free Columnstore Training Class on Tuesday

Fundamentals of Columnstore IndexesYour report queries are too slow.

You’ve tried throwing some hardware at it: your production SQL Server has 12 CPU cores or more, 128GB RAM, and SQL Server 2016 or newer. It’s still not enough to handle your growing data. It’s already up over 250GB, and they’re not letting you purge old data.

Will columnstore indexes help?

Next Tuesday, you’ll learn:

  • How columnstore data is stored, and how that impacts your architecture choices
  • How to do a hands-on self-assessment of your workloads, data, and server to figure out whether columnstore indexes make sense for you
  • Why partitioning makes so much sense for columnstore indexes
  • How to do a proof-of-concept implementation with initial creation, querying, ongoing loads, and index maintenance

No registration required – just add this event to your calendar, then show up at BrentOzar.com/training/live/ on next Tuesday at 8AM UTC. This is a Europe-friendly time zone class – folks in other parts of the world will either need to get up at an oddball time, or else buy the recordings.

If you’d like to follow along with the demos during class – totally optional, not required – here’s how to set up a lab VM. This requires a specialized full-size 300GB copy of the Stack Overflow database. This is not the kind of class where you wanna follow along on a low-end laptop: this requires the Mastering-level setup because columnstore is all about big data.

See you in class!


How My Attitude and Branding Changed Over 20 Years

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

It wouldn’t be a good navel-gazing series without a couple thousands words looking inward at myself.

When I first got started writing online, I was writing for myself. I blogged because it was enjoyable. I just liked writing. (I still do.)

10 years ago, I wanted to be 3 things.

Technical, approachable, likable

In the late 2000s, I started thinking more about branding. I joined Quest Software in 2008, and I spent a lot of time working with the marketing team. Smart, friendly people like Christian Hasker, Andy Grant, and Heather Eichmann got me to think more about how my blog looked to someone who was just walking in the front door for the first time, so to speak.

I went through a branding exercise to come up with 3 words to describe my online persona, and I came up with:

  • Technical – I wanted to write technical details about the real-world problems I ran into and how to solve them. This sounds obvious in retrospect, given the blog that you’re reading, but in the early years, my writing was all over the place, like buying pantyhose to build a turtle tank filter.
  • Approachable – I wanted readers to feel like they could talk to me about my work and their own database problems. I didn’t want to seem like I was locked in an ivory tower.
  • Likable – I wanted readers to laugh, enjoy their time with me, and come back for more.

I repeated that mantra over and over. I stopped writing stuff that wasn’t technical, I showed up on every social media platform, I got to know everybody I could in the SQL Server community, and I tried to write content everybody would love, universally. My SQL 2008 R2 Review post was a good example of what I aimed for during this period.

Over time, I gave up on approachable.

Thinking about smashing

I realized that being approachable meant that people would actually approach me. That sounds great in theory, but there were an endless stream of people who wanted me to do their work.

For free.

Right away.

I don’t remember the first time I got an email that said something like, “My client needs me to do ___, please walk me through it.” I wish I’d have had a video of me reading those words. Was I dumbfounded? Suspicious? I don’t remember the first time, but wow, did those emails come in like crazy over the years.

While I wanted to be both approachable and likable, I had zero interest in doing someone else’s job for free and being taken advantage of. At first, I kinda politely said, “That’s a great idea for a blog post. I’ll make a note of that and touch on it at some point in the future.” I pretty quickly abandoned that and just used a set of GMail canned responses.

I still wanted to be approachable, but…I didn’t want that to be one of my 3 core branding words. If that was one of the first things people thought about when they thought of me, they would bring me every tiny problem for free as if I was their virtual coworker, and I simply couldn’t afford to do that. I wouldn’t have any family time left.

Over time, I gave up on likable too.

My smile hasn’t changed

Let’s be honest here, dear reader, because you know me pretty well: we both know I’m sarcastic.

It was easy to keep the sarcasm toned down in blog posts because I’m a huge believer in scheduling posts ahead of time. If I click Schedule instead of Post, then I have more time to think over the contents. I’ll come back to a post a day or a week later, contemplate the contents, and then take out the sharpest humor.

As I started spending more time on social media, especially Twitter, the sarcasm shone right through. I would see a tweet, respond sarcastically immediately, and then regret it a day later. (Or not.) I can sit here and say, “That person deserved a sharp reply so they could see the error of their ways,” but we’re talking about me here, and the problem is that I’m sarcastic.

Microsoft would bring out a feature that didn’t make any sense, and I’d tweet about it sarcastically. The Professional Association for SQL Server would shoot themselves in the foot again, and I’d take to social media with some witty banter. The problem was that I was gradually offending a lot of people in my industry: they might forget about their bad decision, but they wouldn’t forget the way I’d tweeted publicly about it.

A big part of the problem: my metrics. I viewed the blog as the big megaphone (because I’ve got over 100K email subscribers and tons of web hits), and Twitter as the quieter water cooler chat, ephemeral stuff. However, someone took me aside one day and said, “You realize you’ve got more than twice the Twitter followers of anybody else in our industry – it’s not even close – and when you say something kinda casually on Twitter, tens of thousands of people can hear it?” That sounds like a humblebrag, but it’s not: I’m humbleashamed. Twitter was a big megaphone too, and as I looked back, I felt really bad about the way I’d used it over the years because it worked against the likable goal. (Those last few words are important, though.)

That moment caused me to rethink the “likable” part of my branding.

I could have pivoted and said, “I’m gonna turn this ship around, and be genuinely likable again online.” But the reality is that the damage was done: nobody in the industry was ever gonna see me as Mother Theresa, and let’s be honest: I’m not Mother Theresa. I can’t market something I’m not. (I mean, I could. But that goes against my core values.) Lemme zoom way out for a second.

Your life is like a blank white canvas.

Whether we’re talking about your career, your job skills, a relationship, your online reputation, whatever, the same concept applies: you start with an empty canvas. Over time, every action you take fills that canvas in, painting a picture.

From time to time, you need to step back and survey your work. Are you painting the right picture? Have you been obsessing over a tiny detail that doesn’t really matter? Do you need to take some major corrective actions? Is there anything you need to paint over? Do you need to change your approach to the rest of the canvas given what you’ve painted so far?

Let’s take job skills, for example. I got started in hotels, then used my hotel experience as a developer & sysadmin for hotel companies. I used that experience to become a database administrator, and filled in more space with work in virtualization, storage, cloud, startups, etc. As long as I kept building on top of that same picture, I could make it better and better over time.

You don’t really get another canvas.

If you want to start painting a different picture, you can take the space you have left and start there – but you only have so much time and mental space. Your new work is going to be heavily influenced by the work you’ve already done on that canvas. If you hit reset on your skills and try a totally different industry, you might paint a smaller, better picture for yourself – for however you define “better” – but it’s going to be smaller in scope. (My Dad did that, starting over as a nurse after decades in business, and I’ve always admired the work and dedication that it required.)

Similarly, branding is like that: if you’ve built up a large online following that sees you a certain way, and you wanna change it, you’ve already got a partially-filled-in canvas. Plus, you gotta keep in mind that you painted this stuff: it’s your style. You might just want to accept yourself for who you are and run with it.

A Dita Von Teese quote helped me decide.

Around the same time I was having the epiphany around the “likable” part of my branding, I saw this:

That quote right there changed my life.

It rang so true for me because I thought back about my experience with sp_Blitz. When I first launched it, there were a few people who griped, “It’s free as in beer, but not free as in speech – it’s not really open source.” When I open sourced it with the MIT license, they griped, “He’s still collecting emails to download it.” When I put it on Github, they griped, “Well, it still has his name in it.” I wanted them to like me, but it was never good enough, and I had to stop beating myself up. Some people just aren’t going to like me and what I do, full stop, and Dita’s quote taught me that that’s okay.

I thought about the people that I admired at the time, and for every one of ’em, there were plenty of haters. But who cares? *I* liked them, and so did hundreds of thousands (or millions) of other people. They made my life more enjoyable and they taught me things. There’s no such thing as a media personality that everyone likes. (I used to think Oprah was an exception – everybody likes Oprah, right? – but the aftermath of the Harry and Meghan interview taught me that even Oprah has her haters.)

In 2016, I came up with 3 new attributes.

When you’re writing code or managing a project, you need to know where you’re going. Blogging, writing presentations, and live streaming is no different: you need to understand the end result you want to achieve, and have a meaningful path to get there. So in 2016, I revisited my branding words to come up with a new strategy for myself, and I came up with:

  • Training class poll, March 2021

    Technical – no changes here. I do share lifestyle & business stuff on my Instagram, my TikTok, and Ozar.me, but those are just hobbiesfor my own fulfillment, kinda like how blogging was for me back in the early 2000s.

  • Pain-relieving – I really work hard to focus my training and consulting on the most urgent, most relevant things that will make your SQL Server pains go away. I’m rarely about learning just for the sake of learning: the things you spend time learning and doing need to immediately pay off in a happier SQL Server.
  • To the point – your time is valuable, and so is mine. We’re both going to cut through the BS as quickly as possible in order to get what we both need to succeed. I want you to be able to stop working with me as quickly as possible so that you can go back to shipping value for your customers or your own career. The faster I can get your server fixed or get the right knowledge into your brain, the better.

(And, uh, this blog post is none of those three. But look, I’m allowed to go off-topic here now and then. This is the 20th anniversary of BrentOzar.com, and I warned you that this series would be navel-gazing.)

The last one is the biggest change over the last 10 years.

Eyes up here, kidScrew civility: if I think something is a waste of your time, I’m going to be up front and tell you right away in no uncertain terms.

I’ll use an easy, obvious example: Microsoft’s certifications have been a waste of your time and money for years. They’ve changed the certification paths, branding, and names so many times that there’s no market awareness and no value to your career. The only time you should invest your precious time in those is if your company is picking up the tab, they allow you to study during work hours, and they require the cert. Those certs are just garbage, and they’re not going to become the next hot commodity. You’re not suddenly going to be an in-demand professional just because you crammed and got some irrelevant cert.

I’m not saying you’re not allowed to waste time. I love wasting time – I spend hours on TikTok, hahaha. We just need to be clear on what’s a waste of time, and what’s marketed to you as something productive and valuable for your career – when it’s really just a waste of time. (I’m picking on Microsoft certifications here, but the same holds true for lots of things that I pan.)

Yes, this means people who, say, sell Microsoft certifications aren’t going to like me. It took me years of self-examination to realize that I really shouldn’t care what those people say anyway: I don’t respect them at all, period, full stop. I actively disrespect them because they’re wasting your time and money, and that really pisses me off, dear reader.

When I say dear reader, I mean the core set of readers around here who are struggling to succeed while maintainining a work/life balance. You, the kind of people who stick around to the end of a blog post like this, the ones who have been reading this blog for years, who regularly find me in your Google search results and click on the links because you know I’ll get you an answer quickly, the ones who leverage the First Responder Kit to do your jobs better and faster.

If I think someone on a live stream is wasting YOUR time disrespectfully, I cut straight to the chase and tell them. I talked about my sarcasm on social media – I think it comes through much more strongly during my Twitch & YouTube live streams. For example, if someone comes to a free class and they didn’t follow the instructions in the prerequisites, they’re going to get a quick, to-the-point instruction that they didn’t do their homework, and we’re moving on. I’m not here for them – I’m here for you, dear reader.

Like peaches and Dita Von Teese, I’m not for everybody.

And I’m okay with that.


Sale Closes Wednesday: Save 75% on My Training Classes, and the EU/UK Can Buy Too.

Company News
6 Comments

Good news, party people: we’re finally selling to the EU/EEA/UK again!

To celebrate, during March you can use coupon code Happy20th to save 75% off my Live Class Season Pass, Recorded Class Season Pass Fundamentals, or Masters recordings, or any of my online classes.

You can also buy my classes now in Euros, GBP, or US dollars, whichever currency is more convenient for you. When you click the Buy button, you get your choice of currency, and it calculates your VAT and helps you with exemptions as well.

I’m also offering live online classes in Europe-friendly times: 8:00-16:00 UTC. (UTC since I’m living in Iceland at the moment.) Here’s the upcoming class schedule:

When you buy a Live Class Season Pass, you get access to all of my live online classes during your subscription. During each year, I’ll teach each of the classes at least 5 times:

  • 2 in US-friendly times, 9AM-4PM Eastern
  • 2 in EU-friendly times, 8:00-16:00 UTC
  • 1 on weekends (Fri/Sat/Sun) in US times

I schedule ’em about 3-4 months in advance, so another US-friendly rotation for the Mastering classes will start around August.

See you in class!


Free Live Webcast: The First SQL Server Webcast You Should Ever Attend.

When you pass in a query, how does SQL Server build the results?

It sounds really simple, but we all take so much for granted.

SQL Server stores most data in 8KB pages – kinda like spreadsheet pages. For example, here’s a simulated 8KB page from the dbo.Users table in the Stack Overflow public database:

dbo.Users clustered index

In a free live How to Think Like the SQL Server Engine webcast on April 7th, I’ll walk you through how SQL Server uses these spreadsheet pages to deliver your query results.

I will give you a scenario: I’m the end user sending in queries, and the SQL Server engine is my client. Using simple spreadsheets as tables, you’ll learn how SQL Server builds execution plans, uses indexes, performs joins, and considers statistics.

This one-hour session is for DBAs and developers who are comfortable writing queries, but not so comfortable when it comes to explaining nonclustered indexes, lookups, sargability, and fill factor.

To prep for the session, download and print out this 3-page PDF, and then register for the session here. See you there!


The Way Developers and Users Interact is Broken.

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

Twenty years ago, I tried to switch to Linux.

Again.

That wasn’t my first attempt, nor was it my last – there were so many more – and every time, I had a terrible experience. I struggled to get the right combination of hardware, software, and configuration settings. But you know what the worst part was? Trying to get support for all of it.

I saw myself as a decently skilled sysadmin and developer with a good amount of experience and troubleshooting skills, but I had one hell of a bad time getting support. Things people said would work, didn’t, and nobody seemed terribly interested in helping me get it right. Documentation, forums, email lists – it was a byzantine mess, nobody was accountable for getting things right, and I felt firsthand the impact of gatekeepers and condescending answers. The more I tried to make it work, the more burned out I got.

In 2006, I gave up and got a Mac.

I gave up the idealistic hope of a complete open source stack, configuration via simple text files, and the feeling of being a wizard. I wasn’t an operating system wizard. I just wanted something to boot up, connect to WiFi and a VPN, install applications, and sleep & resume, and when something broke, I wanted to get support from a company that would be personally accountable for getting it working again. I was fine with handing over money and permissions in order to get simplicity.

Today, I maintain the open source
First Responder Kit.

It has SQL Server scripts like sp_Blitz that help you assess your SQL Server’s health and performance. They’re the same scripts I use in my own consulting work, and are used by data professionals around the world. Back when I used to track downloads, I was tickled pink to see that even Microsoft’s own support staff downloaded them.

Today, as an open source maintainer, I see the other side of the story.

Many users don’t read the documentation. They try to use the scripts in completely inappropriate and unsupported ways. They complain when the scripts don’t do what they want. They see all this as my own responsibility, saying things like:

Eyes up here, kid
Your call is important to us, please hold.
  • “You should make it be able to do ____”
  • “I’m very disappointed that it doesn’t ____”
  • “It needs to ____ or else it’s worthless”
  • “Please do the needful”

When I point out the readme and the documentation, which clearly states that if you want a new feature, you need to start coding for it – it almost always falls on deaf ears. I’m endlessly thankful to the folks who actually do contribute code, documentation, and support, because they’re so few and far between compared to those who see open source as free consulting & development.

This is hilarious timing, but as I was writing this, my watch vibrated with a direct message in the SQL Server community Slack from someone who wants help. I responded asking them to try in the #FirstResponderKit channel as the readme says so that other folks can help too. (I checked back an hour later – they never did.)

I totally understand that a lot of the First Responder Kit users who need help are just as qualified as I was when I was trying to get Linux help. I wasn’t an amateur, and neither are these users. But it’s just an endless flood: there are always way, way more inexperienced users than there are experienced users. There just aren’t enough experienced users with free time to help, and many of the people who see themselves as experienced…aren’t.

It happens with Microsoft, too:
try Azure Data Studio.

It’s one of Microsoft’s newest open source products for database developers, a SQL Server Management Studio replacement for people who write T-SQL rather than manage servers. When you run into a problem with Azure Data Studio, you’re expected to file it in Azure Data Studio’s Github repo.

There are over 1,000 open bugs, like serious bugs that I hit all the time, many of which have been open for years.

When you browse that list, it’s hard to determine if the software is buggy, if the users need better support, or if it’s a combination of both. Pro tip: it’s always a combination of both.

There are almost 1,000 open requests for enhancement, and many aren’t trivial requests: they’re monsters, like adding support for entirely new database platforms. Users feel free to ask for a pony because there’s no cost to them for making the request. Any bozo can ask for a pony by writing up a Github issue, and they can gather thousands of upvotes by posting it on social media or forums.

I don’t blame Microsoft for how this is going down. This is what happens when you build complex cross-platform software that runs on multiple operating systems and targets multiple databases. It’s the same story with Visual Studio Code’s issues, where there are also over 1,000 open bugs going back to 2016.

These aren’t open source problems.
They’re software problems overall.

It’s a problem with $1 phone apps, where users resort to leaving bad reviews and developers try to do tech support by responding to the review.

It’s a problem with Microsoft SQL Server, one of the most expensive database platforms in the world. You would think that when you hit a SQL Server problem and you think it might be a known bug issue, that it’d be easy to go to a web page, search for the symptoms, and see a known list of bugs and the status of whether they’ve been fixed or not.

For example, say you’re hitting issues with CMEMTHREAD waits. You go to support.microsoft.com, put CMEMTHREAD in the search box, and you get this wall of text:

There’s no sorting and filtering for database versions. You’re left to click on every single post and figure out whether it’s relevant to your patch level. There’s no commenting or questions, either. At the bottom of each post, there’s a “Need more help?” search box – which just takes you back to this search result list.

And these are only the published, fixed issues.

Not the known broken issues. Those are hidden away in secret for only Microsoft employees to access.

If you’ve been working with SQL Server for a while, you might know about feedback.azure.com, a place where you can post known bugs with SQL Server in the vain hopes that Microsoft will do something about it. In rare cases, they do – but in most cases, the requests go unaddressed. And just like its predecessor, Microsoft Connect, this feedback site is going away in 2021.

Closed source software only seems like it doesn’t suffer from the extensive bugs problem because the bug list is kept hidden by the manufacturer. We users thought the software was reliable, but if you talk to the really experienced pros in the industry, they’re often furious by just how broken common features are.

The way developers and users interact
is completely broken.

I’m hopeful that someone’s going to solve these problems during my lifetime.

Stack Overflow might be one way to solve it, but they’ve got a lot more work ahead of them. Right now, the newest incoming questions feed is an insane firehose, and the DBA.StackExchange.com new questions, while lower in volume, is still too challenging for the volunteers to solve alone. Maybe Stack could solve it by doing a better job of walking users through crafting good questions, and then recruiting (and paying) vendor employees to answer questions about their products.

Microsoft’s acquisition of Github might be another way to solve it because there’s no way they’re happy with the way this is going down, either. Microsoft could tune the new-issue process at Github and start handling paid support requests through Github issues. I’m sure I wouldn’t be the only open source maintainer who would gladly charge $10 for folks to open a support case just to cause ’em to think more before asking for a pony, and offer a higher-paid issue tier with a higher service level agreement.

Or maybe it’ll be an all-new Software-as-a-Service provider, but…I don’t think that’s going to be the case, given how Microsoft experimented with UserVoice and then pulled back to their own in-house products.

I don’t know what the right answer is, but the way we’re doing it now is completely busted. I should be thankful, though, because the fact that SQL Server support is so completely broken happens to enable a delightful career in fixing that very product’s shortcomings.


Updated First Responder Kit and Consultant Toolkit for March 2021

Brad McGehee speaking to the Red Gate PASS party, 2011

Ten years ago, I unveiled sp_Blitz at the 2011 PASS Summit, in front of a live audience.

Remember the PASS Summit?

Remember live audiences?

<sigh>

Anyway, to celebrate the 20th anniversary of BrentOzar.com, I’ve thinking back about the last couple of decades to reminisce about what I’ve learned, what’s changed, and so forth.

Open source kinda sorta existed when I got started with computers as a kid way back in the 1980s. I would buy a programming magazine, prop it up in front of the computer, and actually type programs in by hand on my Commodore 64. Lemme just tell you, that sucked. I’ve been bad at debugging for four decades.

Today, anybody in the world can install open source utilities in a matter of seconds, make their job easier, increase their value as a professional, and give back to those same open source projects. It’s not fair to say that your Github profile is the new resume – most folks just don’t have the time to contribute to open source, nor are they allowed to at your day job. However, I think it’s fair to say that your Github profile is one of many possible advantages when competing for a job. When someone asks, “Do you have experience with X?” it’s amazing to be able to answer, “Yes, and I’ve even contributed code and documentation to X. Here’s my Github profile, and here are some of my interactions with the team.”

So with that said, here’s this month’s new version of the First Responder Kit. It’s free – free as in speech – and it thrives thanks to the community members mentioned in the changes below. They’re real people just like you with day jobs, and they decided that they wanted to contribute or fix something. They’re the heroes that make this whole thing work.

When you find a bug or want something changed, read the contributing.md file.

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_Blitz Changes

  • Improvement: we ignore databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Erik Darling.)
  • Fix: made all of the scripts consistently refer to https://www.brentozar.com rather than variations without https, the www domain, etc. (#2808, thanks John D. Ballentine III.)
  • Fix: removed the checks for active & inactive heaps in user databases. This is covered really well by sp_BlitzIndex now, and it’s slow in sp_Blitz. (#2818, thanks Erik Darling.)

sp_BlitzAnalysis Changes

  • This is the second release that includes sp_BlitzAnalysis by Adrian Buckman. If you want to install this script, install sp_BlitzAnalysis.sql – it’s not included in the normal installation scripts yet just to limit the blast radius in case something goes wrong. For tips on how to use it, check out the documentation. For questions or problems, see the “For Support” section of this doc.
  • Fix: expanded one of the columns from the money datatype up to decimal(38,2) to handle big data. (#2807, thanks Ian Menton.)

sp_BlitzCache Changes

  • Improvement: when @IgnoreSystemDBs = 1, we ignore queries with a context of databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Iomckee.)
  • Improvement: @OutputType parameter now accepts ‘NONE’. (#2817, thanks Jefferson Elias.)

sp_BlitzFirst Changes

As we mention in the documentation, Azure SQL DB isn’t technically supported by any of the First Responder Kit scripts because Azure SQL DB doesn’t have all of the cool diagnostics stuff that we get on-premises, like sp_MSforeachdb. However, at some point in the future, we’ll officially support a lot of them up there, and this month’s changes inch us towards that.

  • Improvement: in Azure SQL DB, we now sys.dm_db_wait_stats instead of sys.dm_os_wait_stats. (#2821)
  • Improvement: pass @OutputType parameter to sp_BlitzCache so if you specify ‘NONE’, there really is no output. (#2817, thanks Jefferson Elias.)
  • Fix: insert of changed statistics would sometimes fail. (#2738, thanks Jefferson Elias.)
  • Fix: Azure SQL DB was erroring out due to addition of sp_MSforeachdb in the recently changed statistics check. (#2819)

sp_BlitzIndex Changes

  • Improvement: we ignore databases named DBAdmin, DBAtools, and DBMaintenance. (#2787, thanks Erik Darling.)

sp_BlitzLock Changes

sp_BlitzWho Changes

  • Fix: in the deltas view, expanded one of the columns from the money datatype up to decimal(38,2) to handle big data. (#2813, thanks Ian Menton.)

sp_DatabaseRestore Changes

  • Improvement: you can now move the full text catalogs with the new @MoveFullTextCatalogDrive parameter. (#2805, thanks Frederik Vanderhaegen.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


Our Monday Links Email is 10 Years Old.

Company News
5 Comments

I’ve been celebrating the 20th anniversary of BrentOzar.com, but it just so happens that this year marks another fun anniversary.

Ten years ago, way back in 2011, I wrote:

You’re overworked.  You don’t have the time to sit around reading blogs, keeping up on the latest industry news, and reading web comics.

That’s where we come in.  To stay on top of our game, we have to spend a couple days per week honing our skills.  We’ve started a weekly email recapping the best stuff we’ve found in the community this week.

You can read the very first one we sent, complete with our original Brent Ozar PLF branding featuring me, Jeremiah, Kendra, and Tim:

The look has changed a lot since then, but the idea is still the same: share the best links from a week of web surfing. There’s always database stuff, but also generally interesting stuff, too.

When we decided to launch a newsletter, we did not want it to be about our own content. If you want to read the stuff we’ve written, that’s what the blog is for. It didn’t make sense to craft new content and then keep it hidden behind a newsletter subscription.

Instead, we wanted to share the best stuff that other folks have written – things that you might not otherwise find. Ten years ago, the blog was already really popular in the database industry, so we knew that if we launched a newsletter, we’d get a lot of subscribers quickly. (It worked: we’ve had over 100K subscribers for quite a while.) We wanted to help other bloggers in our industry by bringing them thousands of fresh readers.

I know firsthand how hard it is to start a blog. You toil away writing a really helpful post – only to have it disappear into the ether, apparently with no readers and no comments at all. Our newsletter tries to fix that by spotlighting the good stuff – no matter whether it’s an established blogger, or someone who’s brand spankin’ new. My favorite moments are when new bloggers say excitedly, “holy smokes, you brought a lot of readers to my blog!” I look at the newsletter as a service to both the readers and to bloggers in our industry.

And it’s worked that exact same way every Monday for a decade.

If you’re not already a subscriber, you can subscribe for free here, and you can see the most recent issue here. Enjoy!


Where I See Databases Going in the Next 10 Years

Architecture, SQL Server
33 Comments

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

I’m going to generalize a little by decade here to keep things simple.

In 1990-2000, databases were mostly tied to specific development languages. When you developed an application with a Microsoft Access back end, you were very likely using Microsoft Access to store the data, too. If you wrote a web page in Microsoft FrontPage, you were likely using Access or Microsoft SQL Server as the back end.

1990s Brent

During 1990-2000, one of our biggest database administration problems was server sprawl. Every time we had a new app, we just spun up a new with a new database. Most of those servers had fairly low load in terms of performance, so as licensing costs went up, we went through consolidation projects to merge multiple databases onto less servers to cut costs. (This continued in the 2001-2010 time frame, especially with the advent of server virtualization.)

In 2001-2010, object-relational-mapping tools (ORMs) tried to abstract the database away. In theory, developers worked with objects, and the ORM layer mapped things to the physical database storage. In theory, some ORMS were cross-platform, saying that you could switch your database back end without rewriting your code. In practice, that didn’t work very well. In the 2000s, our app code language choice still pretty much dictated the most popular database we would use for that particular language. C#? SQL Server. Java? Oracle.

During 2001-2010, Microsoft SQL Server rode the wave of .NET popularity, being the easy choice for C# developers. Microsoft’s strategy to grow SQL Server appeared to be making SQL Server a good fit for other purposes, too. It was hard to grow revenue by convincing PHP developers to switch from MySQL to Microsoft SQL Server – the cost difference just doesn’t make sense.

However, it was much easier to grow revenue by convincing ETL developers to switch from Informatica to SQL Server Integration Services. SSIS was free in the box with SQL Server, so if you were using Informatica to push data into SQL Server…why waste your licensing money on Informatica? Same thing with SQL Server Analysis Services, SQL Server Reporting Services, Filestream/Filetable to use SQL Server as a file server, XML in the database, CLR code to use SQL Server as an application server, and so on.

In 2011-2020, 3 things changed how we pick databases.

In no particular order, all of them being independent:

  1. Developers got tired of SQL’s rigid requirements and started building open source NoSQL persistence layers (MongoDB and Redis are great examples) to solve specific challenges at a very low cost.
  2. Cloud providers made it really easy to spin up a new production-quality persistence layer, like one you might not have otherwise had the skill or spare hardware to spin up, and only pay for what you use (in theory.)
  3. Hosted software-as-a-Service (SaaS) apps meant that often business folks didn’t even have to pick a persistence layer at all. Software vendors that used to have to support their apps on-premises on a variety of databases now just took that decision (and the support) away altogether. Instead of having to pick a database their enterprise customers would be comfortable with, they could use whatever they wanted.

As these 3 things continue to spread – and I use that term in the current tense, because they’re still happening – Microsoft SQL Server is less and less the obvious de-facto choice for C# developers. If you’re building a C# app in the cloud, it might make more sense to use:

Early 2010s Brent
  • DynamoDB if you just need key/value storage, not a full blown relational database
  • Redis if you just need a cache
  • Azure SQL DB if you just need a little code in Azure Functions, and you need relations
  • PostgreSQL if you want relational at a low cost without being tied into a cloud vendor
  • Just a plain ol’ file share if you want to save XML or JSON data for later analysis with a variety of tools
  • And so on.

Plus, those 3 things above lowered the barrier to using multiple persistence layers, each for their own specialized purpose. There were so many persistence layers available, at such a seemingly low cost, without having to worry about on-premises support for each of ’em – that developers became more emboldened to use multiple persistence layers even inside a single application.

The architect in me loves this because we get to choose the perfect tool for each task at hand.

The business owner in me is horrified because we’re heading for something completely untenable.

2021-2030: Welcome to Sprawl 2.0.

Data lives everywhere, in redundant copies, none of which speak to each other, and all of which cost us by the byte, by the month.

Performance sucks everywhere, and every query costs us by the byte of data read.

None of these persistence layers (SQL Server included) have good cost management built in for size or performance. Most databases are still in their infancy of telling you which queries are burning the most resources, let alone telling you which queries are costing you the most money.

I got a sneak peek when I was talking to the data warehouse manager at one of my clients. He showed me their architecture diagram for their next-generation data warehouse: Azure Data Factory pulling files from sources, loading them into various databases, dumping flat files into a data lake, queried from Spark and Microsoft Big Data Clusters. He explained which parts were pay-as-you-go services, which parts were VMs, and which parts were in Kubernetes.

Then he said, “Yesterday, my costs quadrupled. The only reason we even found out so quickly is because the accounting team had an alert set up. My team has been working on it for four hours, and they’ve narrowed it down to query workloads (not storage or ADF) – but we’re realizing we don’t even have the skills to answer the question of what queries cost us the most money yesterday, or if they’re new queries, or new users, or changed database structures, or what.”

“We’re just letting people run queries and light money on fire.”

New database servers could solve this, but won’t.

When we solved Sprawl 1.0, we were at least only dealing with a handful of database platforms. We could gather all the SQL Server applications together onto less servers.

With Sprawl 2.0, we have:

  • Countless database platforms hosting…
  • Countless apps built with different languages and connection libraries
  • Living in a mix of different cloud vendors and on-premises

In theory, vendors could:

  • Build a single database that masquerades as multiple databases
  • Let you connect lots of existing apps down to just one platform
  • Build good cost management, making it compelling to executives
  • Maybe even build in cost reduction with automated query tuning, making it even more compelling to execs

Azure Cosmos DB is the forerunner here, I think: one database back end, but lots of APIs so that it could act as any number of different NoSQL or SQL servers. In theory – we’re talking 2021-2030 here – Microsoft could gradually enhance it so that you could take applications written for different back ends, and then consolidate them so everything lives in Cosmos DB.

But at the same time, Azure Cosmos DB is yet another new server that we have to manage, calling to mind the classic XKCD cartoon:

Similarly, but way less ambitious, Amazon’s working on Babelfish for Aurora PostgreSQL. The idea is that you’ll put both your Postgres and SQL Server databases in the same place, and connect your applications to it. It’ll service T-SQL queries out of PostgreSQL.

Look, I said it was less ambitious – I didn’t say it wasn’t ambitious.

Babelfish has a hell of an uphill battle because just SQL Server alone has an incredible amount of features, data types, and edge case gotchas. Switching your SQL Server application over to Babelfish on Aurora is going to call to mind the ORM lessons of 2001-2010. Existing applications are complex enough that in most cases, it’s just too hard to forklift their data out of one database, move it over to another, and expect everything to work. With both Cosmos DB and Babelfish, there will be case studies funded by vendors that will gloss over the complexity, but…these migrations will be few and far between.

(Just because someone in the audience will ask – no, Azure Arc doesn’t solve this problem of database sprawl. If anything, it enables sprawl because it encourages self-provisioning of databases & servers & services, both in the cloud and on-premises.)

At the same time that vendors are trying to build one-size-fits-all databases, they’re going to be in a race against sprawl. More new specialized databases will pop up. Existing databases will get new capabilities. The centralized, sprawl-reducing databases won’t be able to keep up.

Sprawl 2.0 is the new normal.

As a database administrator, your job isn’t just to manage the configuration, health, and performance of the database servers you have today.

In 2021-2030, your job is to:

  • Watch the options for persistence layers out there, reading developer news sites to see what new options are popping up, where they’re succeeding, and where they’re falling short
  • Help your team evaluate new persistence layers by discussing their advantages and disadvantages compared to the persistence layers you’re already using
  • Encourage the use of existing persistence layers rather than picking new ones – because every new one that comes in, you’re gonna have to learn to manage it, and you’re not exactly sitting around bored, and the company’s not hiring help for you, either, because the sales pitch on all these new persistence layers is that they’re all self-managing – and you know how that goes

You don’t have to develop a level of expertise on all of the options out there – it’s just too hard, and the new players change too quickly. For example, it’s a waste of your time to try to master a new platform like Cosmos DB if your company isn’t actually using it yet. However, you wanna learn about where it’s a good fit and what its weaknesses are – so that when your developers start talking about using it, you can understand its strengths and weaknesses relative to the ones you’ve already got.

As a blogger, I hate telling you to do something without giving you references to actually do it. In this example, I should be a good blogger and say, “Now go here to learn about Cosmos DB’s strengths and weaknesses.” The problem is that I just don’t have a good list of resources for every persistence layer out there, and I don’t have the time to go build it, either. I can’t even recommend vendor documentation sites because they’re so biased in favor of the product.

I’d leave you with one piece of advice: when you want to learn more about a database, Google for:

  • (database name) release notes – and then read about the known weaknesses or bugs.
  • (database name) limitations – and look for features that are important to you.
  • (database name) site:news.ycombinator.com – because Hacker News surfaces a lot of interesting pro and con stories. You have to take these with a grain of salt – but the same thing is true for vendor documentation, too.
  • If it’s available, read the database’s Jepsen analysis to learn about edge case failures
  • And finally, if it’s open source, find its Github repo, and read the list of issues. Sort them by most commented, and see what problems the community has run into. Pay particular attention to how long those issues have been open, and what kind of progress the maintainers are making.

Because it’s really easy to start building a database or an online service.

It’s much harder to do it well.

And if you thought it was hard to manage just one, wait til you see how hard it is to manage a whole bunch of half-ass persistence layers.


It’s Been Decades. Why Don’t Databases Manage Themselves Yet?

SQL Server
26 Comments

This month, to mark the 20th anniversary of BrentOzar.com, I’m stepping back and looking at the big picture rather than blogging about database problems & solutions like I usually do.

A much younger me

When I started working with databases, we had to:

  • Provision the right CPUs and memory
  • Provision the right storage throughput and size
  • Install the database
  • Configure the database
  • Protect it – which sounds easy, but that also includes designing the right high availability and disaster recovery solution
  • Design the right tables, and put indexes on them based on how we wanted to query the data
  • Write fast, accurate queries to load & retrieve data
  • Troubleshoot the whole thing when it went wrong

Today, decades later…yeah.

We’re still doing all of that stuff. It’s just that we’re cave-people using rocks for tools instead of banging on things by hand.

Vendors aren’t focused on these problems.

Oh sure, they’re marketing that they’ve built something better, but when you start to rub your finger on the product, the wet paint wipes off and you see the conventional problems underneath.

Let’s take something really simple: let’s build a new Azure SQL DB:

Configuring a new database in Azure SQL DB

 

You get sliders for vCores and data max size.

But notice the number with the biggest font size on the screen: 150. Your choices for vCores and data size determine log space allocated, and it’s important enough that Microsoft puts it in bold, but you can’t actually control it directly. If you want more log space, you have to increase data space.

So if it’s important enough to put in bold, we must know the right number for it, right?

Nope. In Azure SQL DB, space consumption is something that’s controlled only by the database server, not by administrators. You can’t fix it by backing up more frequently: Microsoft controls your backups. Microsoft is hinting that, “Yo buddy, you better know how large your transactions are and how far behind your AG secondaries get, and if you get this wrong, inserts/updates/deletes will fail.” They’re supposed to have fixed this with Accelerated Database Recovery, which the documentation says is on in Azure SQL DB and can’t even be turned off – but it’s still an issue, or it wouldn’t be the biggest number font on the screen. Or even worse, if it is fixed, why is it being featured so prominently, suggesting to users that they need to build a bigger server to handle a log file size that doesn’t matter anymore?

<sigh>

It’s 2021, and we’re back to 2001.

We’re still screwing around with micromanaging transaction log sizes, trying not to run out of disk space. On premises, we learned the easy fix long ago: just set log file sizes reasonably, and then leave autogrow on. If one database out of fifty happens to have a runaway transaction, they can grow temporarily. We’ll deal with it.

Scaling CPU? Self-re-configuring? Nope.

Forget about storage – what about sudden demands for CPU or memory? Serverless, self-scaling databases like AWS Aurora Serverless and Azure SQL DB Serverless say they solved it with auto-scaling. That’s not solving the problem – it’s just unlimited provisioning. You wake up to a $100,000 cloud bill when some bot scrapes your site or when an app ships with an N+1 bug.

I know first-hand because our own SQL ConstantCare® product is built atop Aurora. Last year, I blogged about how a single database view cost me $50/day for months, and you know what? I’m in that same exact boat again today. Our database costs gradually creeped up because we deployed more and more queries for new features, and some of the new queries weren’t well-tuned. I had an item in my backlog for weeks to go do more database tuning, and our Aurora costs crept up to over $200/day. I had to go bust open AWS Performance Insights, find the queries involved, tune them, check them into Github, have Richie fix the unit tests, and then deploy it.

Self-configuring databases? Not even close: Microsoft SQL Server 2019 and its cloud equivalents, Azure SQL DB, still ship with the same Cost Threshold for Parallelism setting that every freakin’ setup checklist still tells you to fix manually. Every release, it seems like we get thrown a bone for one or two configuration settings just so the marketing team can say the database is becoming more self-configuring. That’s great, but at this rate, it’s going to be 2070 before setup checklists can go away – if ever, because vendors keep adding features and dials that need extensive tweaking.

Self-patching? Well, kinda, but only in the worst way. In Azure, you don’t get a truthful changelog, and you don’t get to test before they do it to your production server, or when. Just this month, in March of 2021, Azure SQL DB finally announced that you can pick what window they’ll use for maintenance to take your server down, and get alerted before they’re gonna do it. You can choose between:

A kinda-younger me friends, Chicago 2012
  • Every day, 5PM-8AM local time, or
  • Mon-Thurs 10PM-6AM
  • Fri-Sun 10PM-6AM

That’s it. Those are your only choices. And that feature is only in preview, and only in select regions. If vendors really cared about database administrators, this stuff never would have gone into production like this.

Somebody in the audience is going to say, “Does Azure Arc help?” Well, it purports to, in the sense that it puts both your on-premises SQL Servers and Azure instances into the same Azure control panel, gives you self-provisioning, and has policy-based management. Given Microsoft’s track record with Policy-Based Management and the Utility Control Point, both of which were grandly-announced features that quietly died off without getting fixes or improvements, I’m gonna wait to see if this is another one-and-done turkey.

Vendors are focused on selling,
and database administrators don’t buy databases.

As important as you think you are, you’ve likely never written a check for database licensing or cloud services.

The reason you have a job is that your employer implemented a database, it grew over time, and it grew beyond the self-managing capabilities touted by the brochure. You were hired long after the database choice was made. And no, you’re not really able to influence future purchasing decisions because that ship has sailed: your company picked a persistence layer, and when it comes time to pick additional persistence layers for new products, you’re only going to be one of many people sitting at the table.

Databases are bought by two roles: developers and executives.

When developers make database decisions, in most cases, they’re not experienced enough with multiple database platforms. They’re either familiar with one, and that’s the one they pick, or they’re not familiar with any, and they’re forced to make a selection based on the vendors’ marketing material, what their friends say, what they read on HN, or some meaningless database rankings site.

When executives make database decisions, they either work from feature lists like EnterpriseReady.io, looking for things like auditing, role-based access, Active Directory integration, or else…they work off the vendor’s brochure, and self-managing is just another feature in a feature list.

Developers and executives buy features.

I don’t mean this in a negative way! It’s the same way we buy appliances or cars or laptops. We have a rough idea of the features we need. Products rarely have all of the features we want – at least at the price point we’re willing to pay – and so we have to make compromises. We might accept a half-baked version of one feature because we want another feature so badly.

In 2021, every database’s brochure has “self-managing” slapped on the brochure somewhere, and every vendor will show just enough demos to hoodwink non-DBAs. “Self-managing” doesn’t have to be fully true in order to sell – just like “secure” or “performant” or “scalable.” Everybody uses these keywords. They don’t mean anything.

Even worse, self-managing is just one of the features people are looking for. Every time a database vendor slaps on another feature, that feature has dials and switches to configure it – and when the feature is brand new, those dials and switches have to be configured by hand. There’s never a best practices doc – the feature has to ship first before the real world learns the best practices, and we all write documentation to tell each other how to set the dials and switches.

Want proof? Check out the home page of Couchbase.com. Look at the kinds of features that they tout. They emphasize the power of the product, empowering you to do all kinds of things – but you still have to actually do those things, like size itmonitor & tune the queries, configure the security, and configure all the settings. It simply isn’t self-managing – and it doesn’t matter, because people love it. They’re going public with a $3 billion valuation. The market has spoken, and features are where the money is.

It’s a race between new features
and self-managing those features.

Sure, vendors can say that a brand-new feature is self-managing – but again, rub your finger on it, read the release notes, run an edge case performance test, and you’ll pretty quickly find the undocumented dials and switches that need to be turned just so in order to make this pig fly.

New features will always win the race.

They have to. Vendors are in their own race against other vendors, racing to ship features that nobody else has, or to keep up with features that someone else has already shipped.

This influenced my own career:
I gave up on production DBA work.

Ten years ago, when Microsoft first brought out Always On Availability Groups, I was ecstatic. I was doing a lot of production database administration work at the time – making sure SQL Servers were highly available and well-protected – and I saw Availability Groups as a better way to achieve that goal.

But then the features versus self-managing race started.

The good news was that Microsoft poured resources into AGs, adding all kinds of features to them. More replicas. Cross-database transaction support. Availability Groups that spanned multiple Windows clusters.

The bad news was that not only were these features not self-managing, they were practically unmanageable: the documentation was terrible, the GUI was non-existent, the damn thing broke constantly in crazy unexpected ways. (For fun details, read the past blog posts on TarynPivots.com, Stack Overflow’s DBA.)

By the time Microsoft said they were bringing Availability Groups to Linux, I said okay, I give, no more, time out. I had come to see SQL Server high availability as a Sisyphean task: I could never really be done with that miserable work, and even when I thought I was done, some new surprise knocked me flat out. Even worse, those surprises always came after hours, on weekends, on holidays, when all I really wanted to do was relax.

If you do want to learn Availability Groups, Edwin Sarmiento is about to open a new rotation of his class. He only does a few of these per year, and it’s a hybrid of consulting, coaching, and training. Learn more here.

I focused on development DBA work instead.

Here’s a quick rundown of the difference between developer, development DBA, and production DBA:

I chose to move from Production DBA to Development DBA. These days, my consulting work and my training classes focus on solving performance emergencies. I can at least schedule those engagements in advance because companies know when their performance is slowly getting worse, and we can work together during weekdays, business hours, in order to turn things around. (As opposed to production outages, which always hit at the worst possible times when I’ve just made a fresh gin & tonic.)

Production DBA work is still there, mind you, and every month I get emails from folks asking me to record new training classes on how to do it. I’m just flat out not interested in that work, and I’m not interested in doing training on it anymore.

So with that in mind, I put our Fundamentals of Database Administration class and Senior DBA Class on YouTube, for free, forever. They’re older classes, and they don’t have the best audio & video quality, but I’d rather just let them out there to the public to help the folks that need ’em. Hope that helps – and I also hope that database vendors start taking the self-managing thing more seriously, for the sake of production DBAs that still have to deal with this mess.

In my next post, I’ll talk about what I think the next 10 years of databases will bring, and how that affects our job duties and career prospects.


What Does a Database Administrator Actually Do?

BrentOzar.com is 20 years old, and I’m celebrating by being a little retrospective and future-looking this month. Today, I’m publishing a post that I could have published 10 years ago, or maybe 20 years ago, and maybe even 10 years from now. How much of this would have changed in the past? How much will change in the future? Think about that as you read through this. Here we go…

A DBA makes sure the databases are backed up, well-configured, patched, secure, and fast.

I explain my job to acquaintances by saying, “You know how some web sites are really slow, and their checkout times out, or it loses your data? It’s my responsibility to make sure those problems don’t happen. I’m not the only person involved – there are also developers and engineers – but I’m responsible for the part that holds your data, like your profile, your orders, your browsing history, and all the data that the evil companies have been gathering about you. I’m not sure I’m supposed to say that last part out loud.”

The job duties of a database administrator include:

  • Ensure all database servers are backed up in a way that meets the business’s Recovery Point Objectives (RPO)
  • Test backups to ensure we can meet the business’ Recovery Time Objectives (RTO)
  • Troubleshoot database service outages as they occur, including after-hours and weekends
  • Configure monitoring utilities to minimize false alarms
  • As new systems are brought in-house, choose whether to use clustering, log shipping, mirroring, Windows Azure, or other technologies
  • Install and configure new database servers
  • Patch existing database servers
  • Deploy database change scripts provided by third party vendors
  • When performance issues arise, determine the most effective way to increase performance including hardware purchases, server configuration changes, or index/query changes
  • Document the company’s database environment
  • Help the company lower their hosting costs by performance tuning, consolidating, and re-sizing

Why is a database administrator important?

When the database goes down, the business stops. If you’re selling things online, and your web site goes down, that’s an emergency. Even if it’s not down – if it just slows down – that can be an emergency too. DBAs help prevent that problem.

When someone accidentally deletes data, the business can stop. Unfortunately, in all too many companies, there are too many folks with access to that data – and those folks are often untrained. They’re just human, and they make mistakes. Database administrators help recover from that problem.

Where do database administrators work?

Since DBAs are involved with protecting important data, you’ll find them everywhere data is important – and these days, that’s everywhere! However, most companies with just 1-2 databases don’t have enough work to keep a full time employee busy. In these kinds of shops, you’ll find a person who spends some of their time doing database administration. We call those folks “accidental DBAs” – they didn’t mean to become DBAs, but they just inherited some of the duties.

The larger a company is, the more DBAs they’ll have. It’s common for big national companies to have entire teams of DBAs.

What’s a typical database administrator salary?

According to our annual Data Professional Salary Survey, DBAs in the United States make an average of $112,880. To slice and dice those numbers or dig into other countries, download the raw data in an Excel spreadsheet.

As with any other profession, folks make more money when they have more experience or live in higher cost-of-living areas.

Can remote database administrators work from home?

They can – and let’s face it, right now everybody’s working from home anyway – but most companies prefer to have their DBAs show up in the office. There’s a level of paranoia around the DBA position because DBAs protect one of the company’s most precious assets: their data. DBAs get pulled into a lot of ad-hoc meetings due to emergencies, and due to the level of severity of those emergencies, managers wanna see faces and lock people in rooms.

Some companies are willing to hire purely remote DBAs, but that’s pretty few and far between. When they do, it’s usually because:

  • The company is in a hard-to-reach location – for example, one of my clients is in a small town, and they simply haven’t been able to find DBAs in their local market, nor have DBAs been willing to relocate to this town. The company had to settle for a couple of remote DBAs instead.
  • The company wants very specific hard-to-find skills – for example, one of my clients needs serious production experience with Distributed Availability Groups, a rarely-used feature, and there just aren’t that many DBAs in the world with those skills. The folks who have it can dictate their job requirements, like working from home.

But these are exceptions rather than the rule.

What’s a database administrator versus a data analyst?

A data analyst writes queries to get answers from the data, whereas a database administrator makes sure that the answers come back quickly.

A data analyst spends a lot of their time writing new queries from scratch using SQL, reporting tools, or data science tools. They usually work directly with business users, and they know more about what the business does. They know what the columns in the tables mean, and they know how the different tables relate to each other. They also spend time interpreting those results: writing reports, talking to the business users, and clarifying requirements.

A database administrator usually has less business-specific knowledge. They know more about systems that hold the data, like Microsoft SQL Server, Postgres, MySQL, Oracle, etc. A DBA will know which users run which queries the most, and they may even help the data analysts craft better queries, but they’re not usually writing the new queries from scratch.

This gets a little confusing because some businesses just slap the term DBA on everything. “If they’re working in the database, they’re database administrators,” hahaha.

What’s the difference between a junior DBA and a senior DBA?

A junior DBA is doing almost all of their tasks for the first time. They rely on Google for pretty much every task they need to accomplish.

A senior DBA has spent a couple/few years working as a junior DBA. They’re used to doing the common tasks like configuring backups, troubleshooting outages, managing security, and responding to end user requests. That’s not to say they don’t still do tasks for the first time – they often do, and they rely on Google to help them figure things out. But at least half the time, they’re doing things they’ve already done before, and rather than Googling the concepts, they’re Googling the most efficient way to do it at scale, like automating it with PowerShell.

The number of years of experience has nothing to do with whether someone’s a junior or a senior. I’ve seen a lot of people who have 10 years of experience, but they’re still a junior – because they’re still doing exactly the same things they did 10 years ago, the same way they did it back then. They haven’t done any learning on how to automate the more common tasks, or how to accomplish more challenging ones.

What tools do database administrators use?

When administering a single SQL Server, they usually use SQL Server Management Studio. It’s a free tool from Microsoft that’s over a decade old, and was written from the start as the main console for the database server. It’s fairly uncommon to see DBAs using third party tools to manage servers just because SSMS has been out for so long and covers so much functionality. I spent most of my time in here.

When administering lots of SQL Servers, DBAs use PowerShell, usually with the DBAtools.io framework. You can get started learning that with the book Learn PowerShell Scripting in a Month of Lunches.

For writing new code, Microsoft wants us to use Azure Data Studio. It’s like Visual Studio Code, a cross-platform app targeted at writing new stuff from scratch. It can do some administration tasks, but it’s really targeted at developers. I spend a little of my time in here because I write queries for both Microsoft SQL Server and Amazon RDS Aurora Postgres, and Azure Data Studio works with both of those.

What training prepares you to get the first DBA job?

Companies almost never hire DBAs without on-the-job experience. To get that experience, there are two common career paths.

Get a job as a developer first, and spend a lot of time querying SQL Server. As your job progresses, you’ll eventually design new SQL Server objects like tables, indexes, and stored procedures. Then you can gradually become “the database person” in the office, and if it makes sense for your career, gradually transition into a dedicated database administration role.

Or, get a job as a sysadmin first, and as part of your overall server administration duties, you’ll come in contact with database servers. As your job progresses, you’ll install new database servers, patch existing ones, troubleshoot them when they break. You can then gradually become “the database server person” in the office, and if it makes sense for your career, transition into becoming a DBA.

In both cases, you’ll have a hole in your knowledge. Developers may not know the installation & configuration side. Sysadmins may not know the querying side. That’s okay – you can learn the remainder of it when you become a full time DBA.

If you’d like a jump start, I have free YouTube courses on the Fundamentals of Database Administration, and a free Senior DBA Class.

Can freshers get a job as a database administrator?

If you’re fresh out of college, you’re unlikely to go directly into a DBA job. You’re better off getting a developer or sysadmin job as I described above, then working your way into a DBA position.

When you do find companies hiring freshers as remote DBAs, they’re usually training the employee for basic DBA job duties. Be careful here: these jobs are extremely low-paying, and they’re probably not going to get you a great DBA job after that. You’re going to have to leave the company to get a better job, but when you take the “expertise” that you learned from this company and try to use it to get the next DBA job, you’re likely to be disappointed. The company was teaching you a bare minimum of things just to bill for your time, and it’s not really seen as real-world DBA experience.

Are database administration certifications enough to get a job?

In the Microsoft SQL Server world, no: Microsoft’s certification programs like the Azure Database Administrator Associate just don’t accurately reflect what database administrators do. They reflect what Microsoft wants DBAs to learn – because Microsoft wants people to evangelize Microsoft cloud services inside their company.

Sadly, Microsoft certifications just aren’t a good path to get your first DBA job. Instead, focus on getting a job as a developer or sysadmin first, then become “the database person” in your shop, and go from there.

Can you be a database administrator without a college degree?

Absolutely – I dropped out of college myself. However, you’ve gotta be willing to put in the years of work doing development and systems administration first before you get a DBA job.

I wouldn’t use myself as an example of “the right way” to get a DBA job. It’s possible – it just takes a long, long time. I would also say that I’m a DBA because I dropped out of development! I didn’t have the patience to continuously learn new languages and frameworks, and debug in all kinds of different browsers. Database administration was easier for me, so I stuck with that.

If you don’t have a degree, take heart: just get work as a developer or systems administrator instead. It’s going to be easier (not easy, just easier than jumping straight into database administration) and continue following the things you love to do, and avoid the things you hate.

Will DBAs still have a job in the cloud?

I get this question a lot as companies start to consider cloud services like Azure SQL DB.

For the answer, read back through that list of job duties at the top of the post. Several of ’em are indeed covered by self-managing cloud database platforms, but many are not. (For example, if you want to keep your backups longer than 35 days, that’s currently only in preview, and in only some Azure data centers.) In addition, performance pricing starts to become a real hassle in the cloud: as your workloads grow, so do your costs. DBAs can now pay for themselves by helping performance tune workloads to reduce hosting costs.

Is database administrator a good career?

If you’re willing to put in the years of work that it takes in order to gradually move into a DBA position, it’s fabulous. You’re the center of a company’s data, which means you know what’s going on everywhere. You know all of the IT staff, you know what projects are happening, and you can help empower everyone to get their job done.

However, it comes with drawbacks:

  • It’s hard to get your first DBA job
  • It’s hard to graduate from junior to senior DBA without changing companies
  • Most DBA positions are on-call, and the on-call work can be exhausting
  • Some DBA positions have tons of responsibilities, but no authority to change things for the better: you’re stuck maintaining a broken system

It’s been a solid career for the last couple of decades, and things look pretty rosy for the 2020s and 2030s. In my next post, I’ll talk about why databases aren’t self-managing yet, and how that affected my last 10 years.


#TSQL2sday 136 Wrap-Up: Your Favorite (and Least Favorite) Data Types

Development
9 Comments

For this month’s T-SQL Tuesday, I asked you to blog about your most-loved and least-loved data types.

Crazy, right? How could people possibly love or hate data types? Well, if you’ve been working with them for a while, I figured you’d have built up an array of tips or pain points, and y’all delivered with 29 interesting blog posts.

Secretly, when I picked the topic, I hoped we’d end up with a couple of data types where people disagreed so much that it was both a favorite AND a least favorite. Indeed, y’all delivered on that too!

Here’s what you wrote, grouped by data type, each in lower case as Aaron Bertrand intended:

bigint

bit

  • Bit versus other data types by Kevin Chant – “So, as you can see in this slightly biased post the bit data type wins here.” (I laughed pretty hard throughout this one because it was not what I was expecting.)

date

datetimeoffset

  • Datetimeoffset Uniqueness by Rob Farley – “the time zone thing isn’t about whether they are actually identical or not. It’s about whether they are considered the same based on the things that the SQL engine uses to consider them equal.”
  • My two least favorite data types by Aaron Bertrand – “it’s easier to convert from UTC to some other time zone than it is to convert from some time zone that may observe DST to some other time zone that may observe DST.”

float

int

  • 4 Things I Love About Integers by Brent Ozar – “Integers are clean, elegant datatypes that do exactly what they’re supposed to. They’re the Golden Retrievers of the database world: they have a few shortcomings, but so, so few.”

money

smallint

spatial

  • Mapping visualizations and geo-spatial coding by Tom Huguelet – “the fact that we got so much for nothing, and the fact that these data types behave like objects with methods, and the fact that mapping is such an important type of visualization for analytics, make SQL Server Spatial Geography Types my favorite.”
  • Spatial is My Favourite, Yet Most Pointless Datatype by Greg Dodd – “I’d rather store Latitude and Longitude as decimal fields and have an application layer do the hard lifting, where I can send things parallel, where I can do some maths to work out the locations I want and index them if I need to.”

sql_variant

text

timestamp

  • The Datatype Blog by Steve Jones – “It has been deprecated, but the timestamp type is still around. It’s not in the list, but it is mentioned as a synonym for rowversion. This is a unique binary number in each database, which is often used to detect changes in a row.”

uniqueidentifier

varbinary

varchar

xml

They’re all good datatypes, Brent

Photo by Barn Images on Unsplash

My thoughts overall

In my training classes, I often compare our work to carpentry.

When you just get started with carpentry, you only have a tool or two. The longer you work in the field, the more you amass a collection of tools, and the better you understand how to use those tools.

This is a discussion we could have had 20 years ago, or 10 years ago. The answers change, too: just because a tool had a valid purpose once doesn’t mean it still does, because better options arise. Sometimes, new options arise, but…they’re not actually better. (I’m looking at you, JSON.)

We’ll still be having this discussion 20 years into the future. I’m celebrating the 20th anniversary of BrentOzar.com, and I can’t think of a better T-SQL Tuesday topic: I could have published this in 2001, 2011, 2021, 2031, or 2041. We’re still facing the same carpentry projects, but the tools we use and the ways we use them gradually change over time.

If you had fun writing (or reading) about this, and you’ve got an idea for a future T-SQL Tuesday episode that you’d like to host, tweet @way0utwest or leave a comment over here.


How I Worked Toward GDPR Compliance: The Really Long Story

Company News
17 Comments

BrentOzar.com is 20 years old, so I’m spending 20 days reflecting. Yesterday, I wrote about how my online classes are available to Europeans now, and today I’m going to talk about why that was a problem for a while, and how I solved it.

The short story is that I outsourced the whole thing to Teachable.

For the long story, buckle up: there’s over 2,500 words of navel-gazing in here.

Twenty years ago, BrentOzar.com ran on my own code.

Me at Florida Code Camp, 2007
Back in my code-slinging days

Back in 2001 when I started writing it, BrentOzar.com served two purposes: it helped me share stuff I’d written, and it helped me become a better developer. I liked coding, and I wanted to do it in my spare time, too, and since I wasn’t satisfied with the blog platforms circa 2001, I wrote my own.

When you’re facing a choice of build vs buy, like build your own blogging platform or use an existing one, there are a lot of variables involved:

  • The quality of the buy-able options out there (including the free ones)
  • Their cost (which is sometimes licensing, but also the hardware it runs on, and the maintenance costs)
  • The quality of a solution you could build yourself
  • That self-built solution’s cost (because your time is worth money, even if that number is small)

In 2001, when I made the build-vs-buy decision, the available blogging platforms just weren’t that good – the market was too immature – and my personal time cost was really cheap. I wasn’t making a ton of money at my day job as a developer & database guy, so I figured sharpening my development skills would raise my day job skills anyway.

After a few years, that got old,
and I switched to blogging software.

In 2003, I switched to Movable Type, and then in 2006, I switched to WordPress. Blogging had taken off, and the blogging platform market had grown. The quality of what I could get off-the-shelf was way better, so I revisited the build-vs-buy decision.

I decided that if I was going to spend a Saturday sitting in front of a computer, I could either spend that time tinkering with my own blogging platform – OR, I could spend that same time writing blog posts, sharing my knowledge. I decided that the latter was a better use of my time.

Developers make the build-vs-buy decision constantly today, way more than they ever did before, because it presents itself not just with each software platform decision, but every logic decision you wanna make. Wanna, say, process XML? You have tons of decisions to make about which library to use or an online service or write code yourself, and the choices can come back to haunt you if a service or library becomes abandoned, or it reinvents itself in a new version that requires tons of code changes. Developers are practically business owners making long term decisions that affect the cost of a project with every framework decision they make. Get the best ones you can. (Yay, Richie!)

When we started selling online training,
we had the build vs buy decision again.

At the time we made the decision, we thought the online training sales market was mature. There were a lot of learning management systems (LMS’s) out there, with a wide variety of price ranges and features. We hired an LMS consultant to help us navigate the choices, and we settled on a pretty expensive one.

Me circa 2011, by Matt Slocum

It was a horrible mistake.

The software was garbage. It charged some students, yet told them their credit card was denied. It double-charged some students. It let some students in without even charging them at all. We worked with their support teams over and over, and nothing got better. (Years later, I found out that particular vendor was a bit of a hollow showpiece: they had one really big paying client, and they focused all of their development & support on that one client. Everyone else was an afterthought.)

As the angry customer emails kept piling up, we were forced to revisit the decision.

We moved the training sales in-house,
using WordPress and WooCommerce.

We certainly couldn’t afford to build an e-commerce system from scratch, but there was a bit of a hybrid:

  • WordPress for the content and users
  • WooCommerce for the e-commerce
  • A variety of WordPress plugins to enable training classes
  • Some custom code to glue it all together – some Zapier, some custom WordPress code

This worked really well for years – until 2017, when the EU’s General Data Protection Regulation (GDPR) approached. Back in 2017, it just wasn’t clear how the GDPR would be enforced, but the penalties were terribad: up to €20M or 4% of your company’s annual worldwide revenue, whichever is higher. As a small business owner, I looked at those numbers and went, “Nnnnnope.” Only 5% of my sales revenue came from the EU. Just to continue selling to the EU, I would have to invest much more than that 5% of my revenue – and it simply wasn’t worth the investment at that time.

I knew I wanted to be fully GDPR-compliant someday – I do love the ideas behind the law – so I took the steps that made sense for me, like requiring folks to opt in before emailing them, implementing a privacy policy, and processing GDPR portability & erasure requests.

However, I didn’t see a way for the combination of WordPress and WooCommerce to be fully GDPR-compliant. For example, customer data isn’t encrypted – so if someone gets your WordPress database backups, you’re screwed. Even worse, in the WordPress plugin world, it’s very common for plugin developers to say, “For me to debug the problem you’re having with my code, I need full access to your production servers, or else I need a copy of your database.” I just couldn’t do that and still feel confident about GDPR compliance, not without doing a whole lot more legal prep work with each and every plugin developer.

I also didn’t want to dedicate Richie (my developer) to working on it. BrentOzar.com is PHP & MySQL, and that’s not what I hired him to do. He uses serverless technologies to enable PasteThePlan, SQL ConstantCare®, and another project we’re working on, and he already deals with enough context switching. I’ve got enough work to keep him busy for years, and I just wouldn’t want him working on the joyless tasks involving GDPR compliance. I’d have to hire another developer, and the ROI just wasn’t there.

So rather than change our sales platform, I stopped selling to the EU and EEA. That raised a lot of eyebrows at the time, but I still think it was the right thing to do as a small business owner. I heard a lot of database folks say, “Just become compliant – how hard can it be?” I had to laugh, especially when those same folks had wildly non-compliant WordPress blogs, hahaha.

COVID-19 changed the training landscape.

I was probably thrown out of Iceland for this photo

In March 2020, I was leaving Iceland based on the US State Department’s recommendations for US citizens to get home as quickly as possible before borders locked down completely. I knew the conference & training game was about to change dramatically: everybody was going to be learning at home for the foreseeable future.

To deal with the new reality, I raised my online training game: I switched from GoToWebinar (which did a terrible job of GDPR compliance) to streaming with OBS, invested in cameras & hardware, and set up scenes to make training way more interactive. Those changes were successful, and helped the classes sell really well to the American audience. That was my single biggest goal for 2020: taking care of the home crowd. In March 2020, when I was making those decisions, I had no idea how long the quarantines would last, or when vaccines would be available. I figured I’d just take it one year at a time.

The SQL Server business wasn’t the only one impacted by the quarantines, obviously: everyone has to get their training remotely now. 2020 was the year when we saw massive improvements in all kinds of software and SaaS platforms targeted at remote meetings and training.

So this year was a really good time to revisit the build-vs-buy decision. I wanted a training sales platform with:

  • The ability to host both recorded and live classes
  • GDPR compliance
  • EU & UK VAT payment & filing – because I didn’t want to hassle with the complexities of international tax filing in order to gain just 5% more revenue
  • Powerful & flexible pricing: subscriptions, one-time pricing, and coupons

And I decided that I would be flexible on any other features – just because I’d gotten used to something in my current WordPress/WooCommerce setup, didn’t mean it would be a requirement in the new training class sales platform.

After a lot of research and testing, I settled on Teachable.com. I migrated my training classes over there, and I’m going to process all class sales through there from here on out. Customers who have existing subscriptions in my WooCommerce/WordPress store can still continue to access their material exactly the same way, and their renewals will continue to work. New customers will just be signing up on Teachable.

Teachable handles a lot for me.

International sales are a giant pain in the rear because the EU and UK charge value-added-tax on online sales these days. If you sell any goods, even virtual ones, the governments want their piece – even if you don’t have a business presence in the EU or UK.

In the past, this meant registering with every single government where someone bought something online, or registering for a VAT Mini One Stop Shop (MOSS) where you filed all your EU taxes, plus there’s probably something separate now for the UK due to Brexit, and frankly, in the words of a wise philosopher, ain’t nobody got time for that.

Even worse, even when we went through all that several years ago, we still constantly fielded tax questions from people. “I’m trying to buy from South Africa using a French credit card over a Swiss VPN, and I believe I’m tax exempt because I wear white shirts on Thursdays.” I’m sure those questions are still going to come in, but…now I can just point them over to Teachable support and call it a day.

Teachable figures out where you’re buying from, handles the VAT number exemptions, collects the taxes, and even files the taxes and pays them for you. They just pay you out daily, weekly, or monthly (your choice) – and then you deal with your own corporate or income taxes based on your overall royalties. You still get all the control in terms of your course pricing, coupons, and availability, but Teachable just handles the accounting part. It’s doable for them at their scale – whereas for me as a small business, it totally wasn’t doable in a profitable way.

They charge for it, of course.

When I used WooCommerce & WordPress, we had to pay a fixed price for our plugins and hosting charges, and a credit card processing fee to Stripe for each purchase. (When we sold to the EU/EEA, we also paid variable overhead to accounting folks based on how much taxes we were filing.)

Teachable has monthly pricing, plus fees per transaction. I took a screenshot of a Recorded Class Season Pass sale to illustrate how things get a little complex, but before the screenshot, here’s the simple math:

  • Course price: $99.86
  • I got paid: $92.99 (about 93%)
  • Teachable got: $2.00 (2%)
  • Stripe got: $4.87 (5% – it looks higher on here, but that’s because they also got a cut of the tax)

The details, which also include currency conversion and taxes and coupons, oh my:

In the past, I’ve repeatedly said that the EU was about 5% of my revenue, so the question becomes: is it worth paying 7% of everyone’s revenue in order to get 5% additional revenue? Obviously not – but that’s not a fair question. We’ve always paid Stripe’s credit card processing fees, so that 5% is a wash. It’s really just a question of, “Is it worth paying 2% more on everyone’s revenue in order to get an additional 5% of revenue?” But that’s not really fair either: the situation is more complex because I can’t use Jilt for revenue recovery, for example, so revenue might go down, too. It’s hard to make these judgement calls in the fog of war. I don’t think I can boil it down to a simple equation, and the numbers are too close to call.

But if they’re too close to call, then I need to do it, because I want to get my training into the hands of more people.

Like any build-vs-buy decision,
there were compromises.

Some of the compromises were related to the GDPR, and others were related to Teachable’s product.

For example, when someone abandoned their shopping cart, I used Jilt.com to send them a reminder, and then a coupon to entice them to finish the checkout process. Jilt worked amazingly well, paying for itself many times over. However, this just isn’t an option with Teachable.

Similarly, when folks bought a BrentOzar.com training class, I automatically added them to Mailchimp, the service I use for email newsletters. I didn’t actually subscribe them to any newsletters – I just wanted them in the database so that when I sent out an update about their training class, I could email them through Mailchimp. Teachable has the ability to do that – but I just decided not to because for the GDPR, I want to minimize the number of places where customer data is kept.

Another compromise is that Teachable doesn’t really specialize in live training classes. You can embed live streams in the course content, but it’s not like Teachable has event calendars and the ability to buy different specific dates. I wouldn’t have a problem selling seats in individual classes on individual dates, but it would have been painful enough in Teachable that I just skipped it. Plus, the vast, vast majority of my live class attendees just buy a Live Class Season Pass anyway. The folks looking to save money or pick and choose specific classes usually ended up buying the recorded versions, which were cheaper – and they can still do that. I didn’t change the pricing on those. If Teachable ups their game and offers easier individual live ticket sales, I’d embrace that – but for now I’m keeping it simple.

I’m fine with compromises like that. If I wanted the perfect online training sales platform for me, then I’d have to hire a team of developers – and the cost on it would be stratospheric, and I’d end up spending too much of my time managing the developers and the platform – and not enough time delivering value to attendees.

I liked the WooCommerce/WordPress combination more than I like Teachable, but Teachable enables me to get the classes in the hands of more students, with less time taken on my part. COVID19 changed the equation there: I just don’t see in-person conferences happening anytime soon, and the customer population in the UK, EU, and EEA continues to grow. I had to be able to service those customers.

Those customers being, uh, you, dear reader.

The work was worth it.

After spending weeks researching tools, picking one, migrating my content over to it, building a bunch of marketing, and going live, the numbers are starting to come in.

Before we stopped selling to the EU, the EU/EEA represented about 5% of our training revenue. In last year’s Black Friday sale, we sold $1M USD. 5% of that would have been another $50K. So leading up to selling to the EU/UK, I defined success as $50K of revenue in the March 2021 launch sale.

48 hours into the sale, I’m happy with the numbers. It’s a success, and the work paid off.

Wanna join in? During March you can use coupon code Happy20th to save 75% off my Live Class Season Pass, Recorded Class Season Pass Fundamentals, or Masters recordings, or any of my online classes.

You can buy in Euros, GBP, or US dollars, whichever currency is more convenient for you. When you click the Buy button, you get your choice of currency, and it calculates your VAT and helps you with exemptions as well.

I’m also offering live online classes in Europe-friendly times: 8:00-16:00 UTC. (UTC keeps it simple since I’m living in Iceland at the moment.)

See you in class!


Wow: BrentOzar.com is 20 Years Old.

Company News
18 Comments
My first avatar, by Jin Wicked

In 2001, I was a developer, and I liked to write.

I’d had web sites for a number of years, but in 2001, I decided to switch from wickedlife.com (archive) over to BrentOzar.com. I wanted a domain name that would more appropriately reflect my interests as they changed over time, and I figured that I wouldn’t always be interested in the goth lifestyle, hahaha.

There were a couple of blogging applications available at the time – this was years before WordPress – but being a developer, I wanted to write my own as a learning experiment. Armed with Macromedia Dreamweaver and a few books, I wrote my own blog platform and started blogging.

In the early years, the blog posts ranged from going to Fry’s, using technologies I’ve long since forgotten, building a car computer, wishing Mom & Caryl a happy Mother’s Day, marketing slogans for girlfriends, and, uh, buying pantyhose to make my own aquarium filters. I kept a webcam on 24/7, which led to some amusing photos of me working. In later years, I started writing the personal stuff over at Ozar.me, and I focused BrentOzar.com on technology, specifically Microsoft SQL Server.

A lot has changed in the last 20 years. Hell, a lot has changed in the last one year!

But some things remain the same:

  • We – you and I – are a community, a tribe.
  • We support each other and help each other grow.
  • As we grow, our job roles may change, so we’ll drift out of some tribes over time. That’s totally okay.
  • You get what you give, and there are so many ways to give: write blog posts, give presentations, volunteer with open source projects, or help coordinate events, virtual or in-person. (Okay, maybe that last one has changed a little.)

Over the next 20 days, I’m going to spend some time here reflecting on things we’ve learned over the last 20 years – stuff that’s changed, stuff that hasn’t, moments that I’ve been proud of, and where I think things are heading. Some of it will be technical, some of it about the industry, and some of it will be navel-gazing.

Thank you for being here for this wonderful couple of decades. I really appreciate you. You make this fun for me.


Save 75% on My Training Classes – Europeans Too!

Company News
53 Comments

Good news, party people: we’re finally selling to the European Union, European Economic Area, and the United Kingdom again!

To celebrate, during March you can use coupon code Happy20th to save 75% off my Live Class Season Pass, Recorded Class Season Pass Fundamentals, or Masters recordings, or any of my online classes.

You can also buy my classes now in Euros, GBP, or US dollars, whichever currency is more convenient for you. When you click the Buy button, you get your choice of currency, and it calculates your VAT and helps you with exemptions as well.

I’m also offering live online classes in Europe-friendly times: 8:00-16:00 UTC. (UTC since I’m living in Iceland at the moment.) Here’s the upcoming class schedule:

When you buy a Live Class Season Pass, you get access to all of my live online classes during your subscription. During each year, I’ll teach each of the classes at least 5 times:

  • 2 in US-friendly times, 9AM-4PM Eastern
  • 2 in EU-friendly times, 8:00-16:00 UTC
  • 1 on weekends (Fri/Sat/Sun) in US times

I schedule ’em about 3-4 months in advance, so another US-friendly rotation for the Mastering classes will start around August.

So, what changes did I make for GDPR compliance? Read the next post in the series: How I Worked Toward GDPR Compliance: The Really Long Story.


4 Things I Love About Integers #TSQL2sday

Development
4 Comments

I love integers.

Furthermore, I think we’ve all been taking them for granted.

Integers are clean, elegant datatypes that do exactly what they’re supposed to. They’re the Golden Retrievers of the database world: they have a few shortcomings, but so, so few. And so for this month’s T-SQL Tuesday, I wanna share my love for integers with you.

1. Integers have clear contents right from the start.

You can declare an integer variable or column datatype, then set it to a string – and it’ll work great as long as the string can be converted into a number:

But if you try to stuff something into an integer that doesn’t belong, you get a clear error right away:

I can guarantee that the only thing in there is gonna be whole numbers, and that I won’t have to play any of the finicky games involved with floating point math or someone trying to store dashes in there to format their social security numbers. If you wanna do formatting on integers, you can – there are functions to help – but it’s usually pretty clear that any formatting should be done on the application side.

2. Integers rarely have implicit conversion issues.

If someone passes in a string, or even a unicode string, and tries to do an equality search of a number field, SQL Server is smart enough to convert the string to an integer first. SQL Server even leverages statistics to understand what the string number will produce in a search!

The resulting execution plan is beautiful because SQL Server converted the parameter, not the contents of the table, and accurately estimated the number of rows:

3. Integers are small – and compress down even smaller.

Integers take up just four bytes of space, and if a few billion values isn’t enough for you, their big brother BIGINTs take up just eight bytes. Eight bytes! I see people constantly slapping NVARCHAR(100) all over the place, storing un-normalized descriptions in row after row after row, just flushing space down the drain when they could be linking to a table with the handful of descriptions that they always use.

Need to save space? Row compression is a piece of cake to implement, and even BIGINT columns are just automatically stored in the smallest possible number of bytes. Row compression makes it easy to justify using BIGINTs as identity columns on tables where you’re worried about a large number of rows in the future because they just don’t take any extra space to store.

Got a lot of repeated values, and want even more compression? Enable page compression.

And when it’s time to build fact tables for reporting systems and data warehouses, buckle up: columnstore indexes bring really bananas levels of compression, like 70% or higher with repeated integers like order line item quantities, prices, or part numbers.

4. Integers even work in URLs.

If you’ve surfed Stack Overflow, you might have noticed how the URLs have integers in them, like this:
https://stackoverflow.com/questions/7395915/does-inserting-data-into-sql-server-lock-the-whole-table/

Well, you know what’s really cool? Lop off everything after the numbers, like this:
https://stackoverflow.com/questions/7395915

And go to that URL:

Stack Overflow automatically pulls up the full URL. Wanna know how they do it? Everything after the number is extraneous, because the number corresponds to the actual ID they’re storing in the Posts table! You can see the database version of the question you’re looking at by running a query:

And presto:

Try doing that with your fancypants strings, bucko, and welcome to URL encoding hell.


Download the Current Stack Overflow Database for Free (2021-02)

Stack Overflow
7 Comments

Stack Overflow, the place where most of your production code comes from, publicly exports their data every couple/few months. @TarynPivots (their DBA) tweets about it, and then I pull some levers and import the XML data dump into SQL Server format.

Stack Overflow’s database makes for great blog post examples because it’s real-world data: real data distributions, lots of different data types, easy to understand tables, simple joins. Some of the tables include:

  • Users – now up over 14 million rows
  • Posts – over 52 million rows’ worth of questions & answers, 143GB in just the clustered index alone
  • Votes – over 208 million rows, making for fun calculations and grouping demos

This isn’t the exact same data structure as Stack Overflow’s current database – they’ve changed their own database over the years, but they still provide the data dump in the same style as the original site’s database, so your demo queries still work over time. If you’d like to find demo queries or find inspiration on queries to write, check out Data.StackExchange.com, a public query repository.

I distribute the database over BitTorrent because it’s so large. To get it, open the torrent file or magnet URL in your preferred BitTorrent client, and the 53GB download will start. After that finishes, you can extract it with 7Zip to get the SQL Server 2016 database. It’s 4 data files and 1 log file, adding up to a ~411GB database.

Want a smaller version to play around with?

  • Small: 10GB database as of 2010: 1GB direct download, or torrent or magnet. Expands to a ~10GB database called StackOverflow2010 with data from the years 2008 to 2010. If all you need is a quick, easy, friendly database for demos, and to follow along with code samples here on the blog, this is all you probably need.
  • Medium: 50GB database as of 2013: 10GB direct download, or torrent or magnet. Expands to a ~50GB database called StackOverflow2013 with data from 2008 to 2013 data. I use this in my Fundamentals classes because it’s big enough that slow queries will actually be kinda slow.
  • For my training classes: specialized copy as of 2018/06: 47GB torrent (magnet.) Expands to a ~180GB SQL Server 2016 database with queries and indexes specific to my training classes. Because it’s so large, I only distribute it with BitTorrent, not direct download links.

If you only have a limited amount of bandwidth, you don’t have to keep seeding the database after you get it – I’ve got it hosted on a handful of seedboxes around the world.

As with the original data dump, these are provided under cc-by-sa 4.0 license. That means you are free to share it and adapt it for any purpose, even commercially, but you must attribute it to the original authors (not me):

so-logo

Happy demoing!


It’s March. We’re still quarantined. Let’s do some free classes for Europeans.

Company News
26 Comments

Last year, during the first few months of quarantines, I ran free training classes to help y’all pass the time and raise your skills.

Thankfully, I don’t need to wear this during the online classes.

It’s March 2021, and so many of us around the world are still stuck indoors, not going into work, not seeing our friends, dealing with mask breath.

So let’s do another round of free live classes – this time around, in Europe-friendly times, 8:00-16:00 UTC.

No registration required – just do your prerequisites, then show up at www.BrentOzar.com/training/live/ a few minutes before the class starts and join in on the fun.

The classes will be recorded, but the Instant Replay recordings will only be available to folks who bought the classes or my Recorded Class Season Pass.

You might be asking, “Brent, how can you keep taunting Europeans about things they can’t buy? Why bother doing classes in European times if the Europeans can’t get the recordings?”

And I’ve got some very good news about that. Stay tuned!


#TSQL2sday 136 Invitation: Blog About Your Favorite Data Type (Or Least Favorite)

T-SQL
36 Comments

T-SQL Tuesday is a monthly blogothon where we get together and write about a different topic. I’m hosting this month’s episode.

Your mission: write a blog post about your favorite data type, and schedule it for next Tuesday, March 9.

Mad about money? Feverish about float? Tell us what you use your favorite data type for, and what people need to know before they get started using it. Data types have so many interesting edge cases, like when Aaron Bertrand taught us to use lower case for our data types. (I still don’t do that, but for the record, I do feel guilty.)

The rules:

  • Your blog post has to go live on Tuesday, March 9
  • Include the T-SQL Tuesday logo, and if somebody clicks on the logo, it has to link back to this post
  • Include a link to this blog post, or after you publish it, leave a comment here so I can track you down for the roundup blog post next week
  • If you want to host a future T-SQL Tuesday episode, tweet @way0utwest or leave a comment on this post

Don’t worry about somebody else picking the same data type – this is the beauty of blogging. Readers are here for your unique take on things, in your voice. And don’t worry about repeating things that are in the documentation – nobody reads that. You could seriously copy/paste the contents of the documentation and you’d get comments saying, “Whoa, I never knew that.”


Menu