Blog

Vertical Partitioning Is Almost Never the Answer. Here’s Why.

Partitioning
10 Comments

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

  • The table’s size on disk is huge
  • Queries are slow, especially when they do table scans
  • People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

However, you’re dreading the amount of code changes required. It’s gonna affect everywhere in your application that does inserts, updates, deletes, and selects. And at the end of the day, are you even sure it’s going to help performance? You don’t have the time to build a test version of the entire application with the new table structures.

Stop. Do index tuning first instead. A nonclustered index is literally a vertically partitioned copy of the table. You might even build an index with the commonly queried columns of the table – and find out that it’s being used for scans. Those scans will be faster than full table scans because the commonly-queried copy is simply physically smaller! The less columns you use for it, the faster those scans will go. Don’t do table changes that require app code changes before you’ve attended my Fundamentals of Index Tuning and Mastering Index Tuning classes.

Next, consider columnstore. SQL Server’s columnstore indexes store every column independently so that you can query just the columns you want. This can have spectacular payoffs for query performance, although not for all workloads or datatypes. Start by taking the free ColumnScore.com quiz to get a rough idea of whether your table is a good fit for columnstore indexes, and if so, check out my Fundamentals of Columnstore class.

Index changes don’t require code changes. Start there, and only after you’ve exhausted those easy wins, should you consider vertical partitioning.

Some of my posts are long and involve detailed code examples. Not this one, though, because the answer’s pretty straightforward. Vertical partitioning isn’t just hard work for you, it’s hard work for everyone around you. Do the easy stuff first, then do the stuff that might seem hard for you, but doesn’t require work from other people. Assigning yourself AND other people hard work should be your very last resort.


[Video] Office Hours: We’ve Got the (Honda) Beat

Videos
3 Comments

Meet my latest toy, a 1991 Honda Beat! I introduce it, then take your top-voted questions from https://pollgab.com/room/brento.

Office Hours with my Honda Beat

Here’s what we covered:

  • 00:00 Start
  • 02:17 Pro: Are sessions sleeping during transactions harmful, and do they prevent the transaction log from truncating? If so, how can’t I prevent them server-side?
  • 03:35 sqlknitter: We have high IO (pageiolatch_sh) causing performance sluggishness. Trying to check index fragmentation, and the query just runs. The GUI freezes. Is this potentially indicative of a highly fragmented index?
  • 04:39 We Don’t Do Corruption Checks!: I don’t trust my SAN Admin. How can I be absolutely sure that the production snapshots he gives me are the type that I can offload CHECKDB to?
  • 05:21 MyTeaGotCold: What is your preferred query hint for telling the optimizer to predict that a lot of rows will come from this query?
  • 06:02 Linux in production: The SQL Server licensing docs have confounded me. Is Enterprise $7,000 per core or is it $7,000 PER YEAR per core?
  • 06:54 BillsFan17: I receive files with hundreds of key-value pairs. I want to store them as is in two columns of a table. My boss wants me to pivot the rows to cols and merge the result into a table for easier reporting. But new keys could arrive at any time making merging difficult. Who’s right?
  • 08:15 SqlServerGuy: How do you see the SQL Server future? A lot of people are saying it not worth to learn SQL Server anymore and go for cloud data technologies such as AWS or GCP to stay competitive on the job market. Do you think SQL Server 2025 may change that?
  • 09:57 SQL_rage: I have a licensed prod server, I want to use one of its db backups to refresh my test/qa server and a dev server monthly. No SA, would I need to license either of the lower instances or would dev edition be adequate?
  • 10:59 ConsultingMadness: My friend has a boss that references older blogs that mention updating some config items which I think SSDs/shared storage make obsolete. Do you have a current recommended config resource? (My friend is trying to ensure that they’re following 2025’s best practice and not 2012’s)
  • 12:16 Janis: Hi Brent, we were wondering if you’ve ever restored the master database to a different server than the original. Specifically, we’re looking to preserve logins and their passwords during the process. Do you have any insights or best practices for this scenario?
  • 13:28 Interested App Developer: Last episode you mentioned a 0 RT/PO solution with queries + results getting dumped into a schemaless db. I was quite gobsmacked. Can you give us any more reflections/details? How did it work with non deterministic things (NEWID(), Update x where y less than now(), window funcs)?

Why I Mention My Sexuality and Gender

Personal
97 Comments

I certainly don’t discuss it often, and it’s been a few years since I wrote about my pansexuality. I hope you’ll forgive me for talking a little about it today. I promise not to make it a habit.

A friend of mine recently asked me privately, “Why would people ever talk about their sexuality publicly? And why do you? Why wouldn’t you just keep that quiet?” It was a really good question, and I was happy to answer it – and I realized I should probably share it publicly, too.

Reason #1: to put a face on “the enemy.”

Watching the Aurora Borealis in AlaskaI’ve had friends & family who have gotten programmed by some media sources. I have actually heard people I know – family included – say things like “trans athletes are a real problem we should focus on” and “drag queens are trying to program our kids.”

Me being out of the closet means that I can send a gentle reminder. You know me. You’ve read my work, you’ve watched my presentations. You know I’m not a bad guy, in the grand scheme of things, and I’ve certainly never tried to convert you to anything. (I do wish you’d ease up on the foreign keys, though.)

When someone tells you that the LGBTQ+ crowd are all the same, and that they’re your enemy, hopefully you can connect the dots and realize that no, someone’s sexuality doesn’t make someone the enemy any more than hair color does. There are LGBTQ+ folks you like, and some you don’t. There are some with an agenda, and some without – just as there are some straight people with an agenda, and some without.

I read the same news you do. These days, the LGBTQ+ community is an easy target for those looking to sow division and hatred. It’s easy to vilify people that you don’t know, and I know many of y’all don’t have close LGBTQ+ friends. (And that’s totally okay!) When you see a talking head vilifying the LGBTQ+ crowd, I hope my image pops into the back of your mind and you pause to think, “Wait, is that person saying Brent is the enemy? Because I know Brent, and I don’t think he’s the enemy. He seems kinda normal. Well, relatively speaking.”

Reason #2: to be visible for the next generation.

I know there are LGBTQ+ folks in the audience who feel alienated. They look at the data community’s blogs, YouTube channels, and conference lineups and think, “To be famous in the database community, you have to be an old, straight white guy.” (At first glance, I look like that myself!)

Brent and Yves in ChinaBy mentioning it every few years, I can send a quiet reminder to the LGBTQ+ community to say there’s a path for them up onto the stage, and that getting up here has nothing to do with your gender, sexuality, race, ethnicity, or age. Anybody can succeed in this community if they’re willing to put in the work.

If you want to ask me questions, feel free!

I promise not to judge you or get angry, no matter the question. (I don’t make that promise about database questions, ha ha ho ho.) Any question that comes from a good place is welcome. Don’t feel guilty if I don’t respond right away – I’m out in Iceland on vacation this week.

You can leave a comment on this post with a fake name/email, or post a question on Pollgab with a fake name/email. If you’d rather ask it via email, you’re welcome to email me at BrentO@BrentOzar.com too, whatever works best for you.

If you feel compelled to send me trolling or hurtful words, that’s okay too. I’m old, I’m secure in my self-image, and those kinds of things don’t bother me. I hope you eventually find peace and happiness because life is too short to waste. Get out there and have a good time doing your thing, whatever your thing is, and if you’re not sure what your thing is, you’re probably a furry, and that’s cool too.


[Video] Office Hours: Hawaiian Edition

Videos
0

I took your top-voted questions from https://pollgab.com/room/brento and answered ’em from my balcony at Disney’s Aulani Resort in Hawaii. I’m not really a Disney guy – we were there for a friend’s birthday celebration – but I tell ya what, that’s my new favorite resort in the world. The water park, food, beach, and nearby activities were great, and it has the awesome Disney service. Top notch. Anyhoo, back to tech:

Office Hours: Hawaiian Jungle Edition

Here’s what we covered:

  • 00:00 Start
  • 01:31 CuriousDBA: Do you still recommend formatting SQL Server Data, Log, and TempDB drives with 64K allocation blocks, or has latest storage advancements made performance benign between 64K and the 4K default?
  • 02:51 MyTeaGotCold: My developers are scared of seeing old data so they never query my synchronous replicas. Since I can’t 100% guarantee sync replication, how can I help them to offload queries?
  • 04:53 Ceiling DBA is Watching You Tabulate: What’s the lowest RTO/RPO that you have seen for whoops queries? How was it satisfied?
  • 06:44 Tenga Dating Simulator: Have you ever heard of Microsoft giving out Standard Edition for free to companies that buy massive amounts of Enterprise Edition? I think my DBA has been tricked.
  • 07:58 chandwich: I’ve been vocal about my desire to be a DBA at my company. I’ve developed scripts (SQL & PS) to automate discovery/troubleshooting/deployments. I’ve taken all of your classes. I changed roles to work with two MS MVPs, but I just can’t wedge my way in. What am I doing wrong?
  • 09:44 RoJo: Inherited (I did not do this) main table has 20 indexes. Is the solution to segment into tables with their own indexes or any other approach?. Not sure if splitting helps or just kicks the can down the road. The indexes are on a Person table for ID type fields to search.
  • 10:58 Dan Dan the DB Man: I noticed that SQL Server allows you the ability to register a DLL as an assembly and then use it in a stored proc or function. Do you recommend taking advantage of this feature or do you avoid it?
  • 11:51 Dopinder: What is your opinion of Optional Parameter Plan Optimization (OPPO) in SQL Server 2025 for addressing parameter sniffing?

I’m coming to Jacksonville and Raleigh for SQL Saturday! Register now.

SQL Server
0

I’m coming to Jacksonville, FL on May 2-3 and Raleigh, NC on May 16-17 for SQL Saturdays! In each city, I’m teaching a one-day pre-conference workshop on Friday.

Jacksonville Pre-Conference Workshop:
Mastering Query Tuning

You’ve been working with SQL Server for a few years, and you’re comfortable writing queries and reading execution plans.

In this 1-day session, Brent Ozar will use the same practical before-and-after techniques that he uses in his Query Challenges blog series in order to demonstrate what parts of your skills need to change as you modernize your databases.

You should be comfortable using SSMS to write multi-page queries, functions, and stored procedures. You should be comfortable identifying common query plan operators like index seeks & scans, key lookups, sorts, and parallelism, and comparing plans.

In this session, you’ll:

  • Learn the difference between early and late estimation errors
  • Learn how to choose between temp tables, CTEs, and table variables
  • Learn the best query hints to apply quickly
  • And more!

Register for SQLSaturday Jacksonville and the pre-conference workshop here.

Raleigh Pre-Conference Workshop:
Faster, Cheaper Cloud Databases

You’re managing databases that live in AWS, Azure, and Google, and you’re getting a lot of questions about slow performance and crazy costs. Your users want everything faster and cheaper, and you’re not finding good answers.

Join me, Brent Ozar, for a day of real-time demos, honest war stories, and practical fixes. I’ll tackle an Azure SQL DB Managed Instance in the morning, and then an Amazon RDS SQL Server in the afternoon. I’ll show how I use the First Responder Kit to assess the bottleneck, mitigate as much as I can with index and query tuning, and then write up a business case for management for better future budgeting.

Throughout the day, I’ll include real-life stories from my cloud clients, with names changed to protect the innocent. I’ll explain what worked, what didn’t, and what went up in expensive flames.

Faster, Cheaper Cloud DatabasesYou’ll learn:

  • How Brent uses the First Responder Kit to assess a cloud SQL Server’s wait types
  • Why each cloud vendor’s VM quirks make performance tuning tricky, and how to adapt
  • How to fix blocking, logging, and CPU issues when the hardware isn’t yours
  • How to write a persuasive performance report for management to fairly lay out where the problem is, and what steps to take next

By the end of the day, you’ll be armed with real-world techniques for keeping your cloud databases faster, cheaper, and more future-proof.

This is a live in-person class. It will not be streamed live or recorded. The class will be May 16, 9AM-5PM at Duke Health, 1st Floor Auditorium, 14 Moore Dr, Durham, NC 27703. Register here.


The 8 Worst Things Microsoft Ever Did to SQL Server

SQL Server
71 Comments

Last week I wrote about the 6 best things Microsoft ever did to SQL Server, but now we gotta pull up a chair and discuss the stinkers.

To be fair, I excluded anything that’s basically ANSI standard. I’m sorry that you don’t like functions and cursors, but the reality is that Microsoft adds that stuff because they have to. And honestly, I don’t have a problem with, say, functions or cursors – it’s Microsoft’s implementation of them in SQL Server that causes performance problems. They could write the engine in a way that was optimized for ’em – but they didn’t. Anyhoo, moving on.

#8: SQL Server on Linux – in 2016 when this feature was announced, I wrote that it was too little, too late, especially if Microsoft tried to stick with the same expensive licensing strategy, and I joked about the decision process to build it. The work required to pull this off was huge, and could have been spent better on things that people really wanted. About a decade later, the adoption speaks for itself: almost nobody uses this. The only reason this “feature” isn’t higher on the list is that I suspect it’s helpful for cloud providers to manage PaaS databases.

#7: VARDECIMAL – sweet summer child, you probably don’t remember that Microsoft brought this new datatype out in SQL Server 2005 Enterprise Edition Service Pack 2, and then promptly deprecated it in the very next version. This came and went so quickly that I don’t think it caused a ton of harm for end users, but it was a sign of bad product management. Microsoft should have held off until they could release real compression, and they shouldn’t have introduced datatypes in service packs, either – what a nightmare for failovers.

#6: Scalar Function Inlining – this SQL Server 2019 feature was such a great idea, but more ambitious than Microsoft could afford to pull off. They simply didn’t dedicate enough development or testing time to do it right, and the result was a buggy mess. For years, Microsoft has been gradually backing this feature back out, as the support KB article explains. Even the most recent Jan 2025 CU17 for SQL Server 2022 found yet another incorrect results scenario with this feature – 6 years after its release!

#5: SQL Server Notification Services (SSNS) – SSAS, SSIS, and SSRS caught on like wildfire, but Notification Services didn’t enjoy the same destiny. SSNS was supposed to let you push notifications (schedule-based or trigger-based) to emails or SSMS – it was basically AWS SNS, but built into the database server. Now, some of you are going to say, “Brent, how could you put English Query in the list of best things because it was cutting edge at the time, but then turn around and put SSNS in the list of worst things? Wasn’t it a valiant cutting edge attempt too?” The answer is no, handling complex notifications is never a good idea in the database layer. Notifications require things like retries and unsubscription handling, and a database server just has no business handling incoming unsubscription attempts. Anybody who’s ever been deeply involved in customer notifications would have immediately flagged this system as a bad idea.

#4: SQL Mail & Database Mail – right about now, you’re yelling at the screen, “Brent, you can’t say it’s a bad idea – I use this every day!” Yep, see, that’s the problem right there: this unreliable, hard-to-troubleshoot feature never should have been a part of the database server. From the start, SQL Server should have had some kind of API access to Exchange or an SMTP server. If we had that kind of API, then by now, we’d have full Slack & Teams integration for SQL Server error messages – but instead, we’ve got the stagnant, awkward Database Mail. (I thought a lot about adding Agent jobs to this worst-ideas list too, but I wasn’t quite ready to deal with y’all’s feedback about that hot take.)

#3: Calling Java from Stored Procedures – around 2019, Microsoft was putting a lot of effort into machine learning at the database level. Language Extensions would let you call R or Python code from stored procs, and I always got the feeling that they kinda threw Java in there just because they could. (I debated calling language extensions altogether a bad idea, but I think Java in particular is exceedingly bad here.)

#2: Big Data Clusters (BDC) – when it was announced for SQL Server 2019, I summed it up by saying, “It’s like linked servers, but since they don’t perform well, we need to scale out across containers.” A lot of Microsoft people got very angry with me about that summary, but I still stand by it. BDC was a ridiculously over-engineered version of linked servers, and Microsoft killed it in the very next SQL Server version.

And the #1 worst thing Microsoft ever did to SQL Server…

They keep walking away from development tooling and starting over.

  • In 2000, we built objects, functions, queries, etc in Query Analyzer.
  • 2005: the new SQL Server Management Studio replaced that.
  • 2008: Data Dude came out as an extension for Visual Studio, and database development was supposed to move over there.
  • 2010: Visual Studio 2010 included the Data Dude features in some (but not all) editions.
  • 2012-2017: a time of absolute chaos. There was an avalanche of different tooling for different versions of SQL Server, SSIS, Visual Studio, etc with names like “Microsoft SQL Server Data Tools – Business Intelligence for Visual Studio 2013”. The documentation includes this amusing confession: “Historically, the Visual Studio shell used to create SQL Server content types has been released under various names, including SQL Server Data Tools, SQL Server Data Tools – Business Intelligence, and Business Intelligence Development Studio.” The word “historically” is doing a lot of heavy lifting there – those products cycled through across the span of just a few years.
  • 2018: the new Azure Data Studio was touted as the new home for database development – not just on Windows, and not just for SQL Server, but for people using Macs, Linux, MySQL, Postgres, CosmosDB, topping their desserts, and mopping their floors.
  • And last week, Microsoft stuck a fork in Azure Data Studio, replacing it with the mssql extension in Visual Studio Code.

If you want developers to use your database, you need a consistent development tool. It’s hard for me to overemphasize how important that is. Microsoft’s constant self-destruction of the development tooling has lead to some really sad truths for the year 2025.

For example, Microsoft’s source control for databases & server config hasn’t taken a step forward in 25 years. The rest of the world has moved on – not just for the data itself, but also server settings, even all the way to things like text-file-driven cloud architectures. Oh sure, third parties like Redgate have tried to duct tape things together, but I feel bad for how much money and time they must have spent on it only to have the development environment keep moving around like some kind of Whack-a-Mole game.

That’s just one example of the problems Microsoft’s short attention span has caused – and there are many others. Educational materials can’t keep up. Online blog posts and tutorials refer to dead tooling, and the authors can’t be bothered to redo their work for Microsoft’s tool-du-jour. Community members have a hard time getting behind the new shiny thing because Microsoft has such a piss-poor track record of abandoning their development projects. (I tried hard to embrace Azure Data Studio, complete with writing blog posts that featured SQL notebooks, but gave up when Microsoft couldn’t get execution plans to work in them.)

So now in February 2025, we’re supposed to redirect our efforts to the mssql extension for Visual Studio Code. I salute those of you out there brave enough to dedicate your time to learning how to use that, adding issues to the list, and making it better in the hopes that Microsoft sticks with this plan. I say that from afar, though, because I ain’t joining you on that journey. SSDT and Azure Data Studio burned me badly enough that I’m going to stick with SQL Server Management Studio for as long as it’s available.


When I look at this list of worsts, I take a long, heavy sigh, thinking about how much planning, coding, support, and marketing work went into them. I feel bad for my friends at Microsoft who had hopes and dreams that these turkeys could fly.

But ending on a bright note: when I look at what’s new in SQL Server 2025, and I feel pretty good. I don’t think there’s anything in there that will go down as a major stinker. I don’t think the vector stuff or Fabric mirroring are going to go down as best-features-ever, by any means, but I don’t think they’re as awkward as the stuff in the list above.


SQL ConstantCare® Population Report: Winter 2025

Every quarter, we publish adoption rate data showing how quickly people are adopting new versions of SQL Server. Today it’s time for the winter 2025 version of our SQL ConstantCare® population report.

SQL Server 2022 is finally seeing some good adoption with about 21% of the market. It’s gradually taking market share from all other versions, including 2019, but 2019 still dominates with 44% of the market:

  • SQL Server 2022: 21%, up from 13% last quarter
  • SQL Server 2019: 44%
  • SQL Server 2017: 12%
  • SQL Server 2016: 15%
  • SQL Server 2014: 4%
  • SQL Server 2012 & prior: 1%
  • Azure SQL DB and Managed Instances: 2%

Here’s how adoption is trending over time, with the most recent data at the right:

SQL Server Adoption Rates over time

About a decade ago, we saw rapid-fire releases with SQL Server 2014, 2016, and 2017 all coming out in quick succession. Their expiration dates are fast approaching too:

  • SQL Server 2014 was unsupported as of July of last year
  • SQL Server 2016 will be unsupported in July of next year (2026)
  • SQL Server 2017 will be unsupported in October 2027

Those rapidly approaching dates means that the next time I publish another one of these 3 months from now, we’re likely to see a really, really big jump in the SQL Server 2022 adoption rate. While companies tend to stay away from bleeding edge stuff, 2022 isn’t seen as bleeding edge anymore, especially given that it’s had 17 Cumulative Updates so far. I bet the 2022 adoption rate will jump, even at the expense of 2019.


Who’s Hiring in the Microsoft Data Platform Community? February 2025 Edition

Who's Hiring
8 Comments

Is your company hiring for a database position as of February 2025? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


The 6 Best Things Microsoft Ever Did to SQL Server

SQL Server
46 Comments

This entire blog post is driven by the #1 feature in this list. I think about the #1 feature a lot, like at least once a week. I think about it so much that I had to stop and think about what other similar great things Microsoft has done over the years, and be thankful for what a nice platform this is to work with. Let’s go through 6 of my favorite Microsoft decisions.

I have to warn you: some of my takes are weird.

#6: English Query – see, I told you some were weird. Decades ago, you could ask the database server what you want in plain English, and get results back. Microsoft was way ahead of their time when they brought this feature out in SQL Server 6.5, and eventually removed it by 2005. The technology just wasn’t capable enough yet, but the idea was so good that it has to make this list. I applaud their efforts to be really cutting edge back then: they did the best they could with the technology of the time, and I’m actually thankful that they didn’t try to revisit this in SQL Server 2025. While it was a good idea to attempt this in the data layer 25 years ago, it’s not smart today because rapidly-evolving LLMs are better suited for a different layer, not integrated directly into the database layer.

#5: Always On Availability Groups (AGs, BAGs, DAGs) – when Availability Groups (AGs) came out in 2012, they aimed to replace a whole slew of high availability and disaster recovery features, each of which was based on different technologies and had different controls. Failover clustered instances, database mirroring, log shipping, bidirectional replication, and SAN replication all still survive today, but Microsoft’s continued investments in AGs meant that they could serve as the foundation for a lot of cloud implementations, including Managed Instances. I do wish they’d invested more in making the tech easy to implement, configure, and troubleshoot, but I still gotta include AGs in this list. It was the right idea, at the right time, and I was really excited when they came out.

Brent Ozar#4: SSAS, SSIS, and SSRS – it’s wild to think back to 2000-2005 when Crystal Reports dominated the data visualization business. Everybody used it everywhere. It was such a de facto standard amongst small to midsize businesses that Microsoft had to bundle SQL Server Reporting Services for free with your purchase of SQL Server just to be competitive. Similarly, they had to give away Integration Services for free to compete with tools like Informatica. For about a decade, Microsoft kept pouring improvements into these products, and if you bet your career on ’em, you did pretty well for a long period of time. Today, they’re falling out of popularity, but I’d be remiss if I didn’t include them in this list because they did so well for so long.

#3: Standard & Enterprise Edition Feature Parity – up until about a decade ago, it was a real pain in the ass to advise developers on which database features to use. Stuff like Transparent Database Encryption, partitioning, columnstore indexes, auditing, change data capture, and compression were only available in Expensive Edition. SQL Server 2016 SP1 brought those features to Standard Edition, making developers’ lives easier. I bet this decision involved a lot of meetings and arguing, but the result paid off.

#2: Dynamic Management Views (DMVs) – we take them for granted now, but back in SQL Server 2000, you couldn’t just run SELECT queries to get diagnostic data about the health and performance of your database server. Today, we use a single simple language to access our user data and our server’s metadata the same way, selecting it out, processing it for reporting, and taking actions on it. It’s not perfect, but it sure beats the DBCC commands we had back in the day.

And #1: Only Making Additive Changes to T-SQL

You probably read those six words seven times, and you’re only beginning to understand the best thing Microsoft ever did for SQL Server. To really get it, read through the tumultuous history of .NET, which went through a rollercoaster of changes. There was .NET, .NET Core, .NET Framework 4.0, back to plain .NET, with crazy changes along the way for Windows Forms, Windows Presentation Foundation, Universal Windows Platform… I can’t even begin to do justice to how much work developers have had to do to just keep shipping the same app, but with current best practices.

On the other hand, queries you wrote in 2000 still just work today, period.

Sure, you might worry about query performance due to execution plan changes, but by and large, you’ve been able to take the exact same database application and just keep connecting it to newer and newer versions of SQL Server without having to rewrite the application itself. There’s no T-SQL 2000, T-SQL Core, or T-SQL Framework 4.0. There have been deprecated features, but those are almost all management-related, not changes that would affect app code.

That’s not to say T-SQL stood still: Microsoft has been gradually adding new capabilities over the years. Not to say all of those additions have been improvements or problem-free, either – but at least they’ve been continuously additive rather than requiring application changes to maintain the same level of functionality.

Next up, we’ll revisit this topic again, but talk through the worst things Microsoft ever did to SQL Server. Buckle up!


[Video] Office Hours and an AI-Generated Sweatshirt

Videos
1 Comment

I originally took these questions from https://pollgab.com/room/brento poolside at the Trixie Motel (really), but I forgot to use the good microphone, so I re-recorded ’em from my home studio when I got back:

Office Hours: Home Office Edition

Here’s what we covered:

  • 01:04 NeedMoney: What’s the fastest/easiest way to make money as SQL Server Consultant (Production work, performance, health check, backups, maintenance)
  • 02:52 LatestGreatest: When would you recommend upgrading to a new SQL Server? Would you suggest when a new relevant feature is released or just when installed version goes end of life?
  • 04:38 PolarExpress: Would you consider/recommend a client to use SQL Server Express edition as viable solution for a small web app?
  • 05:55 ConstantCare: How much Cloud Costs (compute, storage, database) do you pay to run the Postgress database for ConstantCare?
  • 06:30 Right Arm of the Darling One: Batch Mode on Rowstore is a few years old now. Has it served you well?
  • 07:33 SadButTrue: Hey Brent, what is the most efficient way to tell if data in a table has changed since a point in time? I don’t care about what changed, just if it did change. I’m using change tracking now, but having a cursor running CT functions over 700 tables has a high CPU impact.
  • 09:04 Craig: If I have an IF statement statement that only includes a single statement, is there a reason to ever use BEGIN…END? It’s common and seems to just take up space. Also, any reason not to put simple IF statement on one line? e.g. IF NOT EXISTS (SELECT * FROM
  • #Results) RETURN;
  • 10:11 MyTeaGotCold: Do you still predict that Amazon is trying to build Aurora for SQL Server / Oracle?
  • 11:15 Handsome Consultants Aside: Hi Brent. Can you think of anything that SQL Server is the best in the market for? Oracle and Postgres seem to have it beat.
  • 13:14 Hekaton: How do people build SQL boxes with 1 TB of RAM? Every mobo I know of only had 4 slots for RAM and I have never seen a 256 GB RAM stick.
  • 14:10 DBACAT: Hi Brent, have you ever done a mashup of your top X number of roasts you do to people from office hours or other sessions? I think it would be hilarious. It’s probably a lot of work but a “The best of Brent’s roasts” would probably get a lot of views.
  • 16:14 JuniorWannabe: Hi Brent, what are your thoughts on a decent quick check if something changed recently on a SQL server? Sometimes an app stops working and it is easy to look at modified date of stored procedures or functions, but what would you look for?
  • 17:32 chris: Having followed your blog for some time I understand fragmentation is rarely the root of a performance problem these days. But, how do I handle vendors that are insistent on defragmenting the indexes? Further, what does the fragmentation per centage mean with a SAN back-end?

Index Rebuilds Make Even Less Sense with ADR & RCSI.

Index Maintenance
6 Comments

Accelerated Database Recovery (ADR) is a database-level feature that makes transaction rollbacks nearly instantaneous. Here’s how it works.

Without ADR, when you update a row, SQL Server copies the old values into the transaction log and updates the row in-place. If you roll that transaction back, SQL Server has to fetch the old values from the transaction log, then apply them to the row in-place. The more rows you’ve affected, the longer your transaction will take.

With ADR, SQL Server writes a new version of the row inside the table, leaving the old version in place as well.

Because you’re a smart cookie, you immediately recognize that storing multiple versions of a row inside the same table is going to cause a storage problem: we’re going to be boosting the size of our table, quickly. However, the problem’s even bigger than that, and it starts right from the beginning when we load the data.

ADR Tables Are Larger From the Start.

We’ll demo it by creating two databases, Test and Test_ADR. I have to use different databases since ADR is a database-wide setting. Then, I’ll create two test tables, Products and Products_ADR, and load them both with a million rows.

The end result looks like this:

Random foods

Let’s compare the sizes of the two tables using sp_BlitzIndex. The first result set is Products (the normal table), and the second result set is Products_ADR.

Initial sizes

The clustered and nonclustered indexes on the Products_ADR table are all larger because like Read Committed Snapshot Isolation (RCSI), ADR needs to add a timestamp to each row to track its version. That timestamp must take up extra space, and that’s the reason, right?

Well, not exactly – rebuild the indexes on both tables and watch what happens:

The results:

After index rebuild

The ADR version of the database plummets in size down to match the non-ADR version. That’s… odd to me. If I had an unlimited amount of time, I’d be curious to find out why that is, but I’m just a man standing in front of a database, asking it to love me, and I’m on the clock, so I gotta move on to the real problem.

I can understand why folks historically said, “After you turn on RCSI, you should rebuild your indexes to reduce space lost to page splits.” That isn’t what happened here, though – ADR was already on at the time we loaded the data, so we shouldn’t have had page splits to add versioning data. The versioning timestamps should have gone in with the initial inserts. Really odd.

I’ve also repeated this demo with additional databases with both ADR & RCSI, and just RCSI, complete with rebuilding the indexes after the load so everyone’s on the same starting point. To keep this blog post short, I’ve omitted the demo code, but you can download the full demo code here.

Sizes by feature

The results all have similar sizes:

  1. No features enabled
  2. ADR on
  3. ADR and RCSI on
  4. RCSI on

However, things start to change when the data does.

ADR & RCSI Tables Grow Really Quickly.

Let’s update 10% of the rows in all of our tables:

And then check their sizes:

After ten percent update

In the “normal” database, the object sizes remain pretty similar because SQL Server was able to update the rows in place. The nonclustered index on QtyInStock grows because about 10% of our rows are changing from 1 to 2, so we’re going to have to move them to their new sort order, which will require new pages.

On the other hand, in the databases with ADR & RCSI enabled, the object sizes exploded, nearly doubling.

If we update another 10% of the rows:

And check the sizes again:

After twenty percent update

All of the databases see growth on the QtyInStock indexes as values move from 1 to 2, different places in the b-tree, which is going to cause some new page allocations. But the clustered indexes remain the same. We already had an explosive amount of page splits there that left a lot of empty space behind, and we’re able to reuse that space for our new round of updates.

Let’s try several rounds of updates – this is gonna take a while:

And then check our sizes:

After many updates

In the ADR & RCSI databases, the clustered index has stabilized at around twice the size of the “normal” database. Is that a problem? Well… in most folks’ eyes, yes. They see this larger table as wasted space, and they want to fix it.

You Can “Fix” That With Index Rebuilds. Don’t.

To fix table bloat caused by old row versions sticking around, just rebuild the indexes:

And the objects drop back to their initial small sizes, same as right after our data loads:

After index rebuild

People get all excited, saying they’ve “saved” disk space – but guess what’s gonna happen as our user workloads start up again, updating just 10% of the rows:

ADR and/or RCSI are both right back to double the clustered index size:

After continued activity

Summary: What’s the Problem That You’re Trying to Solve?

If you’re still rebuilding indexes like it’s 2005, thinking that you’re accomplishing something, stop. Time has moved on, and best practices have moved on.

You have to zoom out and rethink about what the real problem is. You’re not really saving disk space, because it’s going to blow right back up again in no time. You’re still going to need that disk space because users gonna use.

I’m not saying you should never do index rebuilds – there are absolutely cases where you should. A good example is when rows go in with a lot of null defaults, but then over the next several minutes/hours, those nulls are populated in, and then the rows are never touched again. In that case, your older pages don’t need the extra empty space that resulted from all those early-on page splits, and an index rebuild will help cram ’em back in tightly.

In. most cases, though, if you think you’re saving space, and you’ve turned on ADR or RCSI, your index rebuild space gains are illusionary and temporary.

Update 2025-08-19: Fill Factor Helps Here Too

There was a discussion on LinkedIn about this post, and Vedran Kesegic wrote that setting fill factor to 90% would help here. Leaving 10% free on every page was enough to let SQL Server write ADR’s version information to the same page. Only the smallest index jumped in size (by 40%) because SQL Server was packing in so many rows per page that even 10% empty space wasn’t enough to keep the version info on-page.

Vedran noted that as SQL Server adds new pages to existing objects (to handle inserts or rows moving around due to key updates), the fill factor setting is ignored, and data’s still going to get pumped in with 100% fill factor. In that case, Vedran suggested index maintenance jobs with a rebuild threshold of 50% might make sense – again, using a 90% fill factor so that the newly distributed data would leave enough empty space to avoid the problem.

I’d add: remember that a 90% fill factor means you’re leaving 10% empty space on every page, which means your database just grew by 10%, your memory was cut by 10%, your maintenance jobs (backups, checkdb, index rebuilds, stats updates, etc) will take 10% longer, table scans will take 10% longer, etc. I’m not dead-set against lowering fill factor – just make sure you’re solving a problem with it rather than creating new problems.


What If You Need to Index a Lot of Duplicate Data?

Indexing
18 Comments

Let’s get a little nerdy and look at database internals. Create two tables: one with a million unique values, and one with a million identical values:

I’m just using dates here, not strings, numbers, or other values – those are great experiments for you to play around with as well, dear reader, but I have a specific case I needed to explain to a client, thus the blog post, and if you want me to write tailored blog posts to your specific scenario, you’re welcome to hire me just as these fine folks did.

Anyhoo, compare their index sizes with sp_BlitzIndex:

Identical index sizes

Both indexes are 17.7MB. Right about now, you’re thinking, “But Brent, who cares? They both store 1 million dates and 1 million IDs.” Yes, but why does SQL Server store the same date over and over?

Proving the Duplicate Storage

Shout out to AI for banging out a quick query to list the 8KB pages involved with an index:

Nice, easy-to-read results:

Index pages

We can pass the database name, file number (1), page id, etc to DBCC PAGE to see its contents:

Output type 3 (the last parameter) gives us a detailed breakdown of the page’s contents:

Page contents

And as you scroll down through the output, you’ll see TimeItHappened stored over and over, and they’re all identical. That’s… dumb.

Fixing It with Rowstore Index Compression

Use this command to enable page-level compression for each index, which uses a dictionary to avoid repeating common values:

And then recheck their sizes:

Compressed index sizes

The AllTheSame table’s index is now half the size of the AllDifferent table because AllTheSame’s values are, uh, all the same, so they get compressed much better.

On the other hand, the AllDifferent table’s values are completely, wildly different, and there’s clearly no way that they could be compressed:

 

Totally, wildly different values

Yep, no way that data could possibly be compressed. </s>

What About Columnstore?

I often say that if you really want compression, your best bet is columnstore. However, I need to do a better job of explaining that, because some kinds of data don’t compress very well. Let’s try nonclustered columnstore indexes on both tables:

And then check their sizes:

Nonclustered columnstore index sizes

In this particular situation, nonclustered columnstore indexes are actually larger than their rowstore counterparts! That’s no bueno. What about clustered columnstore?

The size results:

Clustered columnstore sizes

Now we’re talkin’: if the dates are absolutely identical, then clustered columnstore compresses the bejeezus out of these dates. However, I wouldn’t just say across the board, “You should default to clustered columnstore indexes” – because the effectiveness of them varies a lot based on the kinds of data you’re storing (not uniqueness, but more about the datatypes), how you load the data, and how you query it after it’s loaded.

In summary

If you’ve got a lot of identical data, and you need to index it, you’re going to need to tell SQL Server to use some kind of compression – either page compression for rowstore indexes, or clustered columnstore indexes.

In the case of this particular client, we were dealing with a multi-terabyte table consisting of just several columns with dates and numbers. We built out a proof of concept to choose between:

  • ~50% space savings with a page compressed rowstore index, which maintained their existing execution plan shapes, keeping query testing simple, or
  • ~75% space savings with a clustered columnstore index, which dramatically changed their execution plan shapes, which would require more dev time to test queries to make sure we didn’t get screwed when going live

I was fine either way, and the business chose to go with the former just to get across the finish line quickly. The results really are different when you change the kind of data you store, though, based on its datatypes, uniqueness, query patterns, etc. If you can run your own experiments and take the time to test stuff like this, hopefully my blog posts alone get you across the finish line. If not, you know who to call.


[Video] Office Hours in My Vegas Backyard

Videos
7 Comments

I took 5 of your top-voted questions from https://pollgab.com/room/brento while hanging out in the backyard on my second gin & tonic. I, uh, might have lost my temper on the last one.

Office Hours in my Vegas Backyard

Here’s what we covered:

  • 00:00 Start
  • 02:38 Frozt: are you familiar with SQL Server assemblies? and if they can cause assertion or IAM corruptions? Thank you Happy New year Brent
  • 04:41 SQL_Linux: Hey Brent, My company is wanting to use AI for data analytics. What common misconceptions have you seen companies hold concerning using AI to find “insights”?
  • 07:12 Culloden: my company is small and we don’t have devs for reporting. Finance team use prod for report development bc data is near real-time. I want to move them to a dev instance, but data can only be a few hours old. What are options for this? we only use simple recovery model in prod.
  • 09:38 App Developer At Crossroads: When did you decide to stick with SQL Server? After 3 jobs over 8 years all with quite different tech stacks I’m feeling like its time to become an expert in something (tech and business domain)/have a long stay at a company so I can become more and more senior/have a big impact.
  • 11:24 MyTeaGotCold: Did resumable online index rebuilds change any of your opinions on or approaches to index maintenance? I love them too much.

Can AI Rewrite Bad Queries in Seconds? Kinda.

Development
12 Comments

When I see horrific code, stuff that would take a lot of manual labor to fix, I like to lob it over to an LLM like ChatGPT just to see how it does.

For example, on a call with a client, I opened up one of their slowest pieces of code to find that it was running a cursor, processing data row by row rather than working in sets. To simulate it, I’ve written up a few stored procedures against the Stack Overflow database:

  • Posts_Update_AnswerCount takes a single PostId, and updates the AnswerCount of that row
  • Posts_Update_CommentCount does the same thing, but for a different column
  • Post_Update_AllCounts calls both of the above for a list of PostIds, looping through them one post at a time

When I look at that code, sure, conceptually I know what work has to be done – but it’s stupid manual labor. So I copied that code into a text file and asked ChatGPT, “Please rewrite this Microsoft SQL Server code into a more performant version.” That’s about as simple of a prompt as you can get! 

Here’s the code ChatGPT 4o came up with in the first pass, and you can read its full response to see its justifications:

Is it better than what we started with? Yes. Is it good enough that I wanna put it into production? No, because it’s still making two separate passes over the exact same rows in the Posts table, updating them one column at a time.

I replied to ChatGPT with, “Can you make it even faster and more efficient?” The result:

Now we’re getting somewhere! I love that it used CTEs with null protection, too, to handle the case where a post might have no answers or comments. I gave it one last shot by asking, “Could you make it faster, simpler, and easier to read?” but it reverted back to the two-update solution.

I then passed the same code and same prompts over to Google’s Gemini 2.0 Flash Thinking Experimental model. Its first result was similar to ChatGPT’s in that it had two separate update statements, but after prompting again to make it faster and more efficient, it came to this conclusion:

Instead of using a pair of CTEs like ChatGPT 4o, Gemini Flash 2.0 Thinking decided to use a pair of left join subqueries. I’m amused that both platforms chose to dump the contents of string_split into a table variable rather than using a temp table or using the string_split directly in the update’s where clause.

In both cases, the LLM (ChatGPT or Gemini) is able to change the query, but it may take multiple prompts to get to the best result – and the results don’t necessarily get better with every prompt. Today, these tools work best in the hands of someone who actually knows the results that they’re looking for, like you, dear reader.

That’s not to say you shouldn’t use them! This example here represents a really good starting point use case. In the client’s example (and this has happened several times), the LLM was able to give me a good-enough starting point in seconds, saving me a lot of time with manual labor on the rewrite. I still needed to do manual labor – I just had to do less, and that’s a win, because, uh, I’m lazy.


I’m Coming to Croatia!

Conferences and Classes
0

A few things lined up perfectly for a European trip this summer. I’m on a European cruise in early June, then Data Saturday Croatia on June 14, and then SQL Bits on June 21.

For Croatia, I’m teaching a one-day preconference workshop: Faster, Cheaper Cloud Databases.

You’re managing databases that live in AWS, Azure, and Google, and you’re getting a lot of questions about slow performance and crazy costs. Your users want everything faster and cheaper, and you’re not finding good answers.

Join me, Brent Ozar, for a day of real-time demos, honest war stories, and practical fixes. I’ll tackle an Azure SQL DB Managed Instance in the morning, and then an Amazon RDS SQL Server in the afternoon. I’ll show how I use the First Responder Kit to assess the bottleneck, mitigate as much as I can with index and query tuning, and then write up a business case for management for better future budgeting.

Throughout the day, I’ll include real-life stories from my cloud clients, with names changed to protect the innocent. I’ll explain what worked, what didn’t, and what went up in expensive flames.

You’ll learn:

  • How Brent uses the First Responder Kit to assess a cloud SQL Server’s wait types
  • Why each cloud vendor’s VM quirks make performance tuning tricky, and how to adapt
  • How to fix blocking, logging, and CPU issues when the hardware isn’t yours
  • How to write a persuasive performance report for management to fairly lay out where the problem is, and what steps to take next

By the end of the day, you’ll be armed with real-world techniques for keeping your cloud databases faster, cheaper, and more future-proof.

Registration is open now for the in-person class on June 13th at Hotel International, Miramarska 24, 10000 Zagreb. See you there!


Does Separating Data and Log Files Make Your Server Faster?

Storage
21 Comments

I’ve already explained that no, it doesn’t make your database server more reliable – and in fact, it’s the exact opposite. But what about performance?

The answer is going to depend on your hardware and workload, but let’s work through an example. I’ll take the first lab workload from the Mastering Server Tuning class and set it up on an AWS i3en.2xlarge VM, which has 8 cores, 64GB RAM, and two 2.5TB NVMe SSDs. (This was one of the cheapest SQL-friendly VM types with two SSDs, but of course there are any number of ways you could run a test like this, including EBS volumes.)

I’ll run the workload two ways:

  1. With the data files on one SSD, and the log file on another SSD, then
  2. Stripe the two SSDs together into one big volume, and put both the data files & log file on the same big volume

Here’s how long the test took for scenario 1, the data & logs separated:

Test 1: 46 minutes

And then with everything on one big striped volume:

Faster

In this test, less time equals a better result – the single big striped volume wins.

This particular workload has a storage bottleneck because in the class scenario, the database didn’t have good indexes to support the queries. SQL Server had to keep hitting the storage to read uncached data up into memory. When we only have one SSD volume to service our requests, our top wait by a long shot is PAGEIOLATCH, with about 339 seconds of that wait in a 75-second sample:

PAGEIOLATCH Waits

However when we stripe the data across two SSD volumes, that gives us more read throughput, so the test runs faster.

“But my workload is different!”

Oh yes, of course, your app’s workload is quite, uh, “special.” You should pin it on the fridge. Mom & Dad will be very proud. However, you’re still just assuming that your special workload will perform better with its storage IOPs and throughput cut in half, and its latencies doubled.

And that’s a pretty odd assumption to make.

Disclaimer: this post is specifically about separating the underlying storage, not the abstractions that you use between the disks and SQL Server itself. We’re not talking about using different Windows volumes, partitions, physical storage adapters, virtual storage adapters, or SQL Server data files. Those are all great discussions to have too, though!


Announcing the 2025 Data Professional Salary Survey Results.

Salary
8 Comments

We’ve been running our annual Data Professional Salary Survey for almost a decade, and I was really curious to see what the results would hold this year. How would inflation and layoffs impact the database world? Download the raw data here and slice & dice it to see what’s important to you. Here’s what I found.

First, without filtering the data at all, worldwide salaries are are down this year for the first time ever, but hold that thought:

Salaries overall

If we filter for just United States folks whose primary database is SQL Server or Azure SQL DB, the salaries stayed about even. Still, that’s not a great sign:

Microsoft databases

I mean, it’s weird to say “not a great sign” when we’re making six-digit salaries – a lot of folks would love to have that kind of problem.

Because this blog’s primary readership is SQL Server folks, I wouldn’t use the survey to draw conclusions about any other platform. The number of responses for other platforms is really low:

Salaries for other databases

The rest of this post has no filters – we’re looking at all countries, all database platforms. What are y’all’s job plans for this year?

Job plans for 2025

Most respondents intend to stay in the same employer, in the same role. This lines up with prior years, too:

Job plans over time

For the last several years, about 2/3 of y’all have intended to stick it out.

Folks who are planning to make a change also happen to be getting paid less – and that’s probably not a coincidence, heh. If you’re thinking about changing roles, you’re probably interested in who’s bringing home the cheddar:

Salaries by job title

Architect is generally a title that involves seniority, too, so time may play into that. Note that manager is high, too – if you manage staff, you tend to get paid more:

Manage staff

This is the second year of the survey where female pay is actually higher than male! The response rate is pretty skewed, but it always has been:

Salary by gender

People who live in large cities tend to make more:

Nearest town size

And people who work in non-profits and local governments seem to be doing pretty well:

Employment sector

Download the raw data here, and hope this data is useful to you when you have salary and career planning discussions with your manager. Here’s to you getting a raise this year!


[Video] Office Hours: Long Answers Edition

Videos
1 Comment

Today’s questions from https://pollgab.com/room/brento require a little bit longer answers:

Office Hours: Long Answers Edition

Here’s what we covered:

  • 00:00 Start
  • 03:20 Josef: What is table partitioning, and can it improve performance? Does its effectiveness depend on the underlying storage? In what scenarios is it most useful?
  • 05:10 DBAMatt: If there was a new Mt. Rushmore with the heads of SQL Server pros who’ve greatly impacted the DBA world — besides Brent Ozar, obviously—who else would you wanna see up there? Maybe Itzik Ben-Gan, Ola Hallengren, Adam Machanic, Paul Randal, or the Kim Tripp?
  • 07:53 DBA Magician Girl: What’s the sales pitch for SQL Server editions other than Dev, Standard, and Enterprise? I thought the others were just toys, but they’re signed up for your Constant Care.
  • 12:27 Juan Pablo Gallardo: An ERP package has an internal benchmark tool for SQL “speed” that runs within the ERP, a loop read of 100k records from few tables and a few copy write, results reported in millisec time for read,write,erase. My gut tells me this test is useless, am I wrong?
  • 14:32 CharoCPC: I have an 8tb DB with 2tb of free space and is part of an AG. I was thinking of truncating the largest tables, shrinking the DB, move the DB to a newly added disk, restore a copy of the DB, copy the records back into the truncated tables and rebuild indexes. Should I break the AG
  • 20:34 DirtyReadsDoneDirtCheap: What common mistakes do DBAs make in their love lives? I think I’m ready to start mine.
  • 24:43 MartinDBA: How many people behind Brent Ozar Unlimited kingdom? 26:54 DonKiddick: Hey Brent one of our SQL servers (SQL 2016) uses AlwaysOn to provide a read only secondary – but I’m being asked if queries being run on the secondary could force a bottleneck on the primary. Thank you for your time 🙂

I Feel Sorry for Untrained Developers Using Entity Framework.

Development
23 Comments

Most of the time, I love Entity Framework, and ORMs in general. These tools make it easier for companies to ship applications. Are the apps perfect? Of course not – but they’re good enough to get to market, bring in revenue to pay salaries, and move a company forwards.

However, just like any tool, if you don’t know how to use it, you’re gonna get hurt.

One classic example popped up again last month with a client who’d used EF Core to design their database for them. The developers just had to say which columns were numbers, dates, or strings, and EF Core handled the rest.

But if you create a string without specifying its length, EF defaults to using NVARCHAR(MAX). That is not a bug. That is by design, and it’s explained in the documentation:

The wording on that really pisses me off because NO, IT IS NOT SQL SERVER DOING THIS MAPPING. There is absolutely nothing in the database engine that’s saying strings are nvarchar(max). This is an Entity Framework problem, and stop pointing the blame at the innocent database engine. (Update 2025/01/10 – the documentation has been updated by Erik Ejlskov Jensen to point out that no, this isn’t a SQL Server problem.)

The documentation goes on to explain how you can manually set the right column lengths, and I’ve sat through several development conference sessions that emphasize how important it is for developers to do that. The problem here is that most folks don’t read the documentation, let alone attend conferences to learn how to use their tools. (I don’t blame the folks – I blame the companies who are pressuring developers to ship quickly without training.)

Demoing One of the Problems It Causes

Let’s create a table with two string columns – one NVARCHAR(100) and one NVARCHAR(MAX). Then, let’s load a million rows into it, putting the same contents in both the short and long columns.

Then, we’ll run identical queries against the short & long string version:

And review their actual execution plans:

The bottom query is the one that hits the NVARCHAR(MAX) column. Your first signs of danger are the yellow bang on the SELECT, and the 99% query cost estimate on the second query, indicating that SQL Server thinks the NVARCHAR(MAX) one is going to be quite a bit more expensive. However, as is often the case with SQL Server, the really big danger isn’t even shown visually.

Hover your mouse over each SELECT operator, and you’ll get a popup tooltip. One of the lines on that tooltip will say Memory Grant. Here’s the one for the NVARCHAR(100) query:

210MB memory grant

When the datatype is NVARCHAR(100), SQL Server allocates 210MB of memory to run the query because it believes it won’t need too much memory to sort that small of a column. However, check the same metric on the NVARCHAR(MAX) query:

5GB memory grant

Because SQL Server thinks that the contents of NVARCHAR(MAX) columns are larger, this query gets a 5GB memory grant, 24x larger. Depending on your server size and workloads, this can be a real problem because the more queries you have running simultaneously, the quicker your database server will run out of memory.

There’s a Fast Fix. It Doesn’t Work Here.

If you frequently search or order by a column, all you have to do is index it, right? Let’s try it:

Index creation error

The ShortString index gets created – but SQL Server can’t create an index on LongString because we can’t use NVARCHAR(MAX) as a key column in an index.

That’s a bummer, and you could argue that it’s a SQL Server limitation that could be fixed. For example, you can create a computed column on a shorter version of the column, and index that:

There’s nothing to say that SQL Server couldn’t do similar work in order to index abridged versions of NVARCHAR(MAX) columns when it could check to see if there any truly long values in that column. It just doesn’t, though, and I understand that it would require Microsoft to do some work.

It’s much easier for Microsoft to say, “Yo, EF developers, read the docs and you’ll never have problems like this.” To some extent, that’s fair, because I can see how someone would expect people to be well-trained on the tools they use every day to do their jobs. This is only one tiny example of the many, many problems you can hit with Entity Framework if you don’t get trained on how to use it first.

If you’re looking for training resources, start with anything by Julie Lerman in whatever way you enjoy consuming training material – videos, books, etc.


Who’s Hiring in the Microsoft Data Platform Community? January 2025 Edition

Who's Hiring
5 Comments

Is your company hiring for a database position as of January 2025? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.