Which Should You Use: VARCHAR or NVARCHAR?

Development
29 Comments

You’re building a new table or adding a column, and you wanna know which datatype to use: VARCHAR or NVARCHAR?

If you need to store Unicode data, the choice is made for you: NVARCHAR says it’s gonna be me.

But if you’re not sure, maybe you think, “I should use VARCHAR because it takes half the storage space.” I know I certainly felt that way, but a ton of commenters called me out on it when I posted an Office Hours answer about how I default to VARCHAR. One developer after another told me I was wrong, and that in 2025, it’s time to default to NVARCHAR instead. Let’s run an experiment!

To find out, let’s take the big 2024 Stack Overflow database and create two copies of the Users table. I’m using the Users table here to keep the demo short and sweet because I ain’t got all day to be loading gigabytes of data (and reloading, as you’ll see momentarily.) We’re just going to focus on the string columns, so we’ll create one with VARCHARs and one with NVARCHARs. Then, to keep things simple, we’ll only load the data that’s purely VARCHAR (because some wackos may have put some fancypants Unicode data in their AboutMe.)

Let’s compare their sizes with sp_BlitzIndex @Mode = 2, which lists all the objects in the database. (I’ve dropped everything else because YOLO.)

Table sizes compared

The NVARCHAR version of the table is bigger. You might have heard that it’d be twice as big – well, that’s not exactly true because the rows themselves have some overhead, and some of the rows are nulls.

The difference shows up in indexes, too. Let’s create indexes on the DisplayName column:

Table & index sizes before compression

The NVARCHAR version of the index is 629MB, and the VARCHAR version is 439MB. That’s a pretty big difference.

I used to hate it when people said, “Who cares? Disk is cheap.”

The first problem with that statement is that up in the cloud, disk ain’t cheap.

Second, memory ain’t cheap either – again, especially up in the cloud. These object sizes affect memory because the same 8KB pages on disk are the same ones we cache up in memory. The larger our objects are, the less effective cache we have – we can cache less rows worth of data.

Finally, whether the data’s in memory or on disk, the more of it we have, the longer our scans will take – because we have to scan more 8KB pages. If you’re doing an index seek and only reading a handful of rows, this doesn’t really matter, but the more data your query needs to read, the uglier this gets. Reporting queries that do index scans will feel the pain here.

But then I started using compression.

If you’re willing to spend a little more CPU time on your writes & reads, data compression can cut the number of pages for an object. Let’s rebuild our objects with row compression alone to reduce the size required by each datatype. (I’m not using page compression here because that introduces a different factor in the discussion, row similarity.)

The size results:

After row compression

Now, the object sizes are pretty much neck and neck! The difference is less than 5%. So for me, if I’m creating a new object and there’s even the slightest chance that we’re going to need to store Unicode data, I’m using NVARCHAR datatypes, and if space is a concern, I’m enabling row compression.

What about the new UTF-8 collation?

SQL Server 2019 introduced UTF-8 support to allow VARCHAR columns to store more stuff. At the time it came out, character guru Solomon Rutzky said it was tearin’ up his heart, concluding:

While interesting, the new UTF-8 Collations only truly solve a rather narrow problem, and are currently too buggy to use with confidence, especially as a database’s default Collation. These encodings really only make sense to use with NVARCHAR(MAX) data, if the values are mostly ASCII characters, and especially if the values are stored off-row. Otherwise, you are better off using Data Compression, or possibly Clustered Columnstore Indexes. Unfortunately, this feature provides much more benefit to marketing than it does to users.

Yowza. Well, we call him a character guru for more reasons than one. At the time, I wrote off UTF-8, but let’s revisit it today in SQL Server 2025 to see if it makes sense. We’ll create a table with it, and an index, and compress them:

The resulting sizes:

UTF-8 sizes with compression

Sure, the UTF-8 versions are about the same as the VARCHAR version – but, uh, neither of those is really a savings compared to compressed NVARCHAR. What if we remove the compression from all 3 versions?

The resulting sizes:

Without compression

Like Solomon wrote years ago, UTF-8 really only makes sense when you’re not allowed to use compression for some reason.

The verdict: just use compressed NVARCHAR.

If you’re positive your application will never need to store Unicode data there, sure, default to VARCHAR. You get a disk space savings, you can cram more in memory, and your index scans will be faster because they’ll read less pages. (Not to mention all your maintenance operations like backups, restores, index rebuilds, and corruption checks.)

But you also need to be sure that the application will never pass in NVARCHAR parameters in its queries, which introduces the risk of implicit conversion.

I say if there’s even a chance that you’ll hit implicit conversion, or that you’ll need to put Unicode data in there, just default to NVARCHAR columns. Compress ’em with row-level compression to reduce their overhead to within a few percent of VARCHAR, and that mostly mitigates the concerns about about disk space, memory caching capacity, or long index scans.

Bye bye bye.

Previous Post
Set MAXDOP in Azure SQL DB or You’ll Get This Cryptic Error.
Next Post
Who’s Hiring Microsoft Data People? October 2025 Edition

29 Comments. Leave new

  • Connor O'Shea
    October 1, 2025 4:24 pm

    Thanks for going through all the choices, Brent! Is there a reason you recommend row compression over page compression?

    Reply
    • You’re welcome. As I wrote in the post, I’m not using page compression here because that introduces a different factor in the discussion, row similarity.

      Reply
      • Connor O'Shea
        October 1, 2025 4:31 pm

        Ah I missed that, thanks!

        Reply
      • NotQuiteRetiredDBA
        October 6, 2025 10:56 pm

        I would argue using page compression changes the argument so it should be part of this debate. That’s because nvarchar verses varchar when using page compression often still makes a big difference in storage (unlike when using row compression as you pointed out).

        Reply
        • OK, cool, you’re welcome to run your own experiments and share them with others on your blog. My experiment is finished here. Thanks!

          Reply
          • NotQuiteRetiredDBA
            October 8, 2025 5:41 pm

            I stand corrected – my results with page compression on one of our tables was similar. Only @2% difference between datatypes. Dang it! Can I keep my opinion even though the data doesn’t support it? 😉

            Row compression: nvarchar = 2,501,576 KB. varchar = 2,441,424 KB.
            Page compression: nvarchar = 1,621,896 KB. varchar = 1,589,392 KB.

            No compression: nvarchar = 4,533,776 KB. varchar = 2,813,456 KB.

          • It’s so true – doing the research really helps us break out of bad patterns. That’s how this whole blog post started, too! I was used to saying, “Just use VARCHAR until you need to store NVARCHAR later, and you probably won’t,” but after this experiment, I totally changed my viewpoint. Screw it, just use NVARCHAR.

        • I’m really new (as in since 3 months) to SQL Server, which is why I spend a lot of time RTFM… It turns out that page compression *includes* row (and prefix) compression, and there is even an extra compression mode for Unicode – which kicks in for in-row NCHAR and NVARCHAR values when row (or, of course, page) compression is used. I’m not sure how this SCSU algorithm differs from UTF-8 but it reportedly achieves similar compression ratios (50%) for Latin-based values.

          Reply
  • I believe that type nvarchar() is designed for the global character storage, more general. instead, the varchar() type is good for English language only.

    Reply
    • Don’t forget that it’s not just for English. If you want to let your inner-Middle-schooler DBA flag fly, you also need NVARCHAR(). INSERT INTO ?(?) VALUES (N'?');

      Reply
    • Not quite!

      VARCHAR(..) uses a code page depending on the collation sequence in play. However, the common “Latin1” collation sequences (unsurprisingly) use code pages aimed at Latin-alphabet based languages

      But a VARCHAR(..) field using, say, Chinese_PRC_CI_AS collation sequence uses (I think) CP936 instead, which is (ultimately) based on the GB2312 character set and handles simplified/modern Chinese glyphs fine.

      In other words:
      – VARCHAR uses a code page depending on the collation sequence
      – NVARCHAR uses UTF-16
      – The UTF-8 collation sequence variants cause VARCHAR to use the UTF-8 code page, but have no effect on NVARCHAR.

      The advantage of UTF-16 and UTF-8 is “any language in any row” — you can have a single column that has a mixture of Chinese, Thai, Maori, Greek and emojis etc. Traditional code pages struggle, and usually fail, to handle a mixture of languages like that.

      Reply
  • This post seemed to be pretty *nsync. Great article and saves the rest of us doing all the work to prove it.

    Reply
  • developers wanting to default to nvarchar that don’t know why, to me is just laziness.

    To be clear, if there is any doubt at all, I go with nvarchar. But defaulting things like phone numbers (particularly pernicious), internal product names, internal codes used by the business, non-email usernames (not to say that a username can’t be unicode, but they typically won’t be) enums and other things that are just never going to have any unicode characters, including some non-english language businesses that use the latin alphabet, many devs are still going to complain for just because. And you made this point.

    I think it is still best to default to varchar even if unicode ends up being more prevalent as the exception than the default. While compression mitigates the storage and memory consumption, I find pretty frequently that when you do aggregation or windowing over compressed string columns, the CPU penalty can be worse than the memory penalty, so you just end up eating more memory.

    Reply
  • I really appreciate this post, thanks. My friend has also informed me that there may also be implications of using each of these data types when executing dynamic SQL (EXEC vs. sp_executesql) and the query plans being stored, not not.

    Reply
  • Very interesting and useful article on compression. However, I have let it send me down a rabbit hole that lead to a warren. In checking table size changes I discovered my main database is *massively* io-bound, with 21.7ms Avg Read Wait and 1,177.8 Avg Write Wait. Yikes! All the other databases, bar one, are reasonable. Methinks I’ve got some work to do!

    Reply
  • Tibor Karaszi
    October 2, 2025 11:17 am

    Just the fact that you specify say varchar(20), but only fit 19 characters. Or 18. Or 20. Or 17. Etc. It depends.
    That alone, if forced to use UTF-8, would make me feel like Sideshow Bob stepping on that rake

    Reply
  • […] Which Should You Use: VARCHAR or NVARCHAR? (Brent Ozar) […]

    Reply
  • […] Brent Ozar asks a question: […]

    Reply
  • I’m confused. Why does row compression help? Isn’t the whole point of having a variable length column that you don’t need row compression to save unused space?

    Reply
  • How does row compression on NVARCHAR affect CPU performance? Will I save on disk, use memory nearly as efficiently, and allow for unicode characters but need increase my CPU count slightly? Thus increasing my overall cost for licensing?

    Reply
    • Eric – that’s beyond the scope of this post. (There are only so many things I can teach y’all in each free blog post – hope that’s fair!)

      Reply
    • Brian Boodman
      October 8, 2025 2:12 pm

      Compression can cause CPU usage to go up, but it can just as easily cause it to go down. Even taking into account CPU concerns, Kendra Little’s general recommendation (as of 2024) is: “Please Compress Your Indexes and Shrink Your Databases if you use Azure SQL Managed Instance.” This recommendation probably also applies to other cloud hosts.

      Here, she was going a step further and suggesting page compression.

      Reply
  • Important part of the post: use NVARCHAR only, when there is a chance that it may contain unicode AND you apply at least row compression and ensures that every junior dba follow that rule.

    There are often columns that contain strings that are some sort of key values and are always in English (or your own language) and can’t be filled by the user itself (yes, a dimension table and substitude key would often enough be better, but …).
    Phone numbers, international country codes, currency codes etc. are stored as VARCHAR often too.
    It make no sense to user NVARCHAR here (you don’t want smileys or Chinese signs in the phone number and country / currency are standarized latin codes).

    Reply
  • As others have said, when you should use nvarchar depends:

    * A code comprised of only of digits? Obviously varchar.
    * An app used only by a single US company that only does business in the US? Varchar is probably fine for every column.
    * An app that might be used by companies or users outside the US. Nvarchar

    What is overlooked in the “disk space is cheap” analysis is, “expanding disk space (or memory capacity) is cheaper than overhauling the app(s) (and dependencies) because you have to switch to nvarchar”.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.