A perfect storm of unusual problems caused a client to ask, “What if we just used NVARCHAR(4000) as the default datatype parameter for any query, regardless of what datatype the table has in it – like numbers or dates?”
It actually works pretty well in most situations, believe it or not.
The Users table of the Stack Overflow database is helpful for demoing this because it has strings, integers, and dates. We’ll create a few indexes:
1 2 3 4 5 |
DropIndexes; GO CREATE INDEX DisplayName_NVCH ON dbo.Users(DisplayName); CREATE INDEX Reputation_INT ON dbo.Users(Reputation); CREATE INDEX LastAccessDate_DATE ON dbo.Users(LastAccessDate); |
And then we’ll create a stored procedure with a single NVARCHAR(4000) parameter. I could use three different parameters, but one is all I need here to prove the point:
1 2 3 4 5 6 7 8 9 10 |
CREATE OR ALTER PROC dbo.usp_SearchUsers @SearchString NVARCHAR(4000) AS SELECT Id FROM dbo.Users WHERE DisplayName = @SearchString OR Reputation = @SearchString OR LastAccessDate = @SearchString; GO EXEC usp_SearchUsers N'2010' GO |
The actual execution plan is delightful: it does index seeks against all three columns, regardless of their datatype:
And there’s no yellow bang warning on the SELECT operator warning you about implicit conversions. SQL Server is able to implicitly convert the NVARCHAR parameter to the appropriate INT and DATETIME datatype to match what’s in the table. The length of the NVARCHAR(40) DisplayName column doesn’t matter either: there’s no implicit conversion problem between different datatype lengths. (Of course, if you were doing an insert or delete, and you tried to insert more data than the table could handle, you’d have a problem, but that’s rather obvious.)
Problems start to arise
with VARCHAR columns, though.
The Users table doesn’t have an ancient, barbaric datatype like that, so we’ll have to add one:
1 2 3 4 |
ALTER TABLE dbo.Users ADD Location2 VARCHAR(100); UPDATE dbo.Users SET Location2 = CAST(Location AS VARCHAR(100)); CREATE INDEX Location2_VCH ON dbo.Users(Location2); GO |
And then we’ll modify our stored procedure to query just that one column:
1 2 3 4 5 6 7 8 |
CREATE OR ALTER PROC dbo.usp_SearchUsers @SearchString NVARCHAR(4000) AS SELECT Id FROM dbo.Users WHERE Location2 = @SearchString; GO EXEC usp_SearchUsers N'2010' GO |
Now, the actual plan es mal:
Now, we’re seeing implicit conversions, and these have three problems:
- We get an index scan, not a seek: we’re reading millions of rows instead of a few
- SQL Server has to do the CPU-intensive work of upconverting every VARCHAR Location2 value up to match the NVARCHAR parameter of the stored procedure
- Our estimates can be bad because SQL Server doesn’t know what it’s going to find after that upconversion happens
That alone is bad, but it gets worse.
Mo columns, mo problems.
Let’s go back to our multi-use stored procedure that searches across multiple columns, and this time let’s add in Location2:
1 2 3 4 5 6 7 8 9 10 |
CREATE OR ALTER PROC dbo.usp_SearchUsers @SearchString NVARCHAR(4000) AS SELECT Id FROM dbo.Users WHERE DisplayName = @SearchString OR Reputation = @SearchString OR LastAccessDate = @SearchString OR Location2 = @SearchString; GO EXEC usp_SearchUsers N'2010' |
Now, the actual execution plan doesn’t even bother with nonclustered indexes at all, and it’s gone from mal to muy mal:
Ouch. At least SQL Server gives us the courtesy of the yellow bang, though.
The moral of the story: NVARCHAR parameters can be alright if you’re only searching NVARCHAR, dates, integers, etc, but be wary of which datatypes can be implicitly converted, and even when they can be converted, they can still result in bad estimates and scans.
Update 2020-05-27: David Barbarin blogged about a real-world scenario where this exact problem struck: someone used an NVARCHAR parameter thinking it would be universally converted (and it is), but backfired due to implicit conversion.
14 Comments. Leave new
Thank you, Brent – another (of many) great article. I appreciate all you share with all of us.
Hi Brent. Why do you consider VARCHAR as “ancient, barbaric datatype” (apologies if I don’t get any intended joke!). Thanks
Chris – most devs seem to think these days that they need to be able to store Unicode stuff at the drop of a hat.
IMHO, unless you need to store Unicode, I always create VARCHARs. I get more efficiency from logical reads by using VARCHAR but I have not researched if that is universally true
Glad I was beat to the punch. I came here with the same question.
>> unless you need to store Unicode, I always create VARCHARs
I was under the same impression and do the same. I have a faint memory of seeing this demonstrated by someone doing a varchar vs. nvarchar head to head test. It might have been Brent in a past vid or in person class but its been a while.
>> most devs seem to think these days that they need to be able to store Unicode stuff at the drop of a hat.
nah, devs are reacting to the wacky requirements PMs or customers asking for that feature when the app will rarely or NEVER store unicode. I fight this battle regularly. Project manager: we need to store unicode. me: are you 100% sure, this will have an impact on what data type is used, and plan on 2x the RAM, 2x the disk space, etc. PM: Oh. OK Maybe not.
I too came to the comments to see how far Brent’s tongue was in his cheek regarding varchars. And here I thought I would never be considering Brent’s tongue. This life as a DBA just keeps getting weirder.
Chris – good news! You’ve only been wrong for the last year: https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/unicode-compression-implementation?view=sql-server-ver15
😉
I should have my customers updated to SQL Server 2019 around the year 2025 or so! 🙂 Until then I’m stuck with varchar barbarism!
I, and so many others, are in Jeff’s world! He won’t less us leave
I know I should try this myself but would changing the where clause so it casts the variable to varchar help?
SELECT Id
FROM dbo.Users
WHERE DisplayName = @SearchString
OR Reputation = @SearchString
OR LastAccessDate = @SearchString
OR Location2 = cast(@SearchString as varchar(100));
I’m also a little confused why implicit conversion between nvarchar and varchar causes an implicit conversion *with* a yellow bang (and the other negative side effects) while an implicit conversion between nvarchar and date/int *does not* cause a yellow bang? There must be something expensive about going from nvarchar to varchar but not nvarchar to int/date.
In fact, the matrix on the “Data type conversion page” just says they’re all implicit conversions without a distinction between going from nvarchar to varchar/int/date (but I’ve got old man eyes and I’m color blind and those little green (?) circles are tiny.
Jeff – great questions, and they’re beyond the scope of this post, but I dig into ’em in my Mastering Query Tuning course.
Jeff,
It has to do with Data Type precedence:
https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-precedence-transact-sql?view=sql-server-ver15
If the data type of the field in the table has a higher precedence than the data type of the @variable that it is being converted to, the @variable gets converted to the data type of the field in the table (and that is good as the indexes (if any) can be used effectively). However if the @variable has higher precedence, well just like Brent showed, no bueno! As the data type of the field in the table gets converted to match the @variable and the index is no longer as effective.
Thank you Steven!
[…] couple of weeks, I read an article from Brent Ozar about using NVARCHAR as a universal parameter. It was a good reminder and from my […]