Implicit vs. Explicit Conversion

Last Updated March 12, 2018

Everyone knows Implicit Conversion is bad

It can ruin SARGability, defeat index usage, and burn up your CPU like it needs some Valtrex. But what about explicit conversion? Is there any overhead? Turns out, SQL is just as happy with explicit conversion as it is with passing in the correct datatype in the first place.

Here’s a short demo:

SET NOCOUNT ON;	

SELECT ISNULL(x.ID, 0) AS ID, ISNULL(CAST(x.TextColumn AS VARCHAR(10)), 'A') AS TextColumn
INTO   dbo.Conversion
FROM
       (   SELECT TOP 1000000
                  ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )), REPLICATE('A', ABS(CONVERT(INT, CHECKSUM(NEWID()))) % 10 + 1)
           FROM   sys.messages AS m ) AS x(ID, TextColumn);

ALTER TABLE dbo.Conversion
ADD CONSTRAINT pk_conversion_id
    PRIMARY KEY CLUSTERED ( ID );

CREATE NONCLUSTERED INDEX ix_text ON dbo.Conversion ( TextColumn );

SET NOCOUNT ON;

SELECT ISNULL(x.ID, 0) AS ID, ISNULL(CAST(x.TextColumn AS VARCHAR(10)), 'A') AS TextColumn

INTO dbo.Conversion

FROM

( SELECT TOP 1000000

ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )), REPLICATE('A', ABS(CONVERT(INT, CHECKSUM(NEWID()))) % 10 + 1)

FROM sys.messages AS m ) AS x(ID, TextColumn);

ALTER TABLE dbo.Conversion

ADD CONSTRAINT pk_conversion_id

PRIMARY KEY CLUSTERED ( ID );

CREATE NONCLUSTERED INDEX ix_text ON dbo.Conversion ( TextColumn );

One table, one million rows, two columns! Just like real life! Let’s throw some queries at it. The first one will use the wrong datatype, the second one will cast the wrong datatype as the right datatype, and the third one is our control query. It uses the right datatype.

SET STATISTICS TIME, IO ON 

DECLARE @txt NVARCHAR(10) = N'A', @id INT;

SELECT @id = c1.ID
FROM   dbo.Conversion AS c1
WHERE  c1.TextColumn = @txt;
GO

DECLARE @txt NVARCHAR(10) = N'A', @id INT;

SELECT @id = c1.ID
FROM   dbo.Conversion AS c1
WHERE  c1.TextColumn = CAST(@txt AS VARCHAR(10));
GO

DECLARE @txt VARCHAR(10) = 'A', @id INT;

SELECT @id = c1.ID
FROM   dbo.Conversion AS c1
WHERE  c1.TextColumn = @txt;
GO

SET STATISTICS TIME, IO ON

DECLARE @txt NVARCHAR(10) = N'A', @id INT;

SELECT @id = c1.ID

FROM dbo.Conversion AS c1

WHERE c1.TextColumn = @txt;

DECLARE @txt NVARCHAR(10) = N'A', @id INT;

SELECT @id = c1.ID

FROM dbo.Conversion AS c1

WHERE c1.TextColumn = CAST(@txt AS VARCHAR(10));

DECLARE @txt VARCHAR(10) = 'A', @id INT;

SELECT @id = c1.ID

FROM dbo.Conversion AS c1

WHERE c1.TextColumn = @txt;

The results shouldn’t surprise most of you. From statistics time and I/O, the first query is El Stinko. The second two were within 1ms of each other, and the reads were always the same over every execution. Very little CPU, far fewer reads.

Query 1:
Table 'Conversion'. Scan count 1, logical reads 738, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms,  elapsed time = 47 ms.

Query 2:
Table 'Conversion'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.

Query 3:
Table 'Conversion'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 5 ms.

Query 1:

Table 'Conversion'. Scan count 1, logical reads 738, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 47 ms, elapsed time = 47 ms.

Query 2:

Table 'Conversion'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 4 ms.

Query 3:

Table 'Conversion'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 5 ms.

So there you go

Explicit conversion of parameter datatypes doesn’t carry any horrible overhead. Is it easier to just pass in the correct the datatype? Yeah, probably, but you might be in a position where you can’t control the parameter datatype that’s incoming, but you can CAST or CONVERT it where it touches data.

Thanks for reading!

Brent says: the key here is that we’re taking an incoming NVARCHAR variable, and casting it in our query to be VARCHAR to match the table definition. This only works if you can guarantee that the app isn’t going to pass in unicode – but in most situations, that’s true, because the same app is also responsible for inserting/updating data in this same table, so it’s already used to working with VARCHAR data. Also, just to be clear, Erik’s talking about casting the variable – NOT every row in the table. That part still blows.

Why monitoring SQL Server is more important than ever

What I Look For When I’m Hiring Database Professionals

10 Comments. Leave new

Raul Gonzalez @SQLDoubleG
May 11, 2016 9:20 am

Thanks for your post,

the main thing is not about implicit or explicit, the problem here is where the conversion happens, because if your query converts [c1].[TextColumn] it doesn’t matter implicit or explicit, your SARGability is down the toilet anyway.

Implicit conversion is all about the data types precedence as opposed as explicit where we can convert to whatever we want as far as the actual data fits in the target data type.

Reply
John Hennesey
May 12, 2016 11:21 am

+1 for making me do a double-take on the Valtrex reference. “Did he really just say…” 🙂 Always witty yet informative – thanks for all the great work you guys do!

Reply
Mad Sql Dba
May 12, 2016 12:06 pm

Just thought I’d add a link to the Data Type Precedence reference:
https://msdn.microsoft.com/en-us/library/ms190309.aspx

Reply
Dave Bower
May 12, 2016 1:09 pm

Interesting, Thanks for the post.

You examples are using Variables in the where clause, Interestingly this also translates to using strings in the where clause as well.

Example.

SELECT *
FROM [dbo].[Conversion] AS [c1]
WHERE [c1].[TextColumn] = ‘A’
GO

Table ‘Conversion’. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 7 ms.

vs

SELECT *
FROM [dbo].[Conversion] AS [c1]
WHERE [c1].[TextColumn] = N’A’
GO

Table ‘Conversion’. Scan count 1, logical reads 739, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 116 ms.

I’m going to have to go back and revisit some of my slow running queries.

Reply
Cor
May 13, 2016 2:37 pm

Like others have already hinted, the difference in performance has nothing to do with implicit or explicit, but more with what is actually converted. SQL Server will always do a data safe convert for an implicit converion, so in the ‘slow’ query, sql server performs a million converions (one for each row), while the ‘fast’ queries only convert the parameter, once. If you swap the datatypes in your test, so the column is of type NVARCHAR, you will see that all queries are just as fast, because now in all cases the parameter is converted and not the values in the table.

Reply
max
June 2, 2016 9:40 am

the article is great, but what is the actual proposed solution for solving this.

Reply
- Brent Ozar
  June 2, 2016 9:42 am
  
  Make your datatypes match.
  
  Reply
  - Max
    June 2, 2016 4:07 pm
    
    Thanks, my question wasn’t looking for conceptual answer, rather its a more case speficic, namely;
    an api call using jdbc driver passes nvarchar4000 variable/for varchar(25) column for a table with 20mln rows. needless to say what that does to performance.
    there isn’t a way to cast convert the variable on driver level, that is where the query is generated, for each transaction.
    that was my original question, I myself have the conceptual answer to match datatypes.
    
    Reply
    - Erik Darling
      June 2, 2016 4:15 pm
      
      If you can’t change the API call to use VARCHAR, you could change the column datatype to NVARCHAR, or make a computed column based on it, casting it as NVARCHAR, and then change the query to point to that column. But if you can’t change the API call, I’m assuming you can’t change the query.
      
      So, process of elimination…
      
      Reply
      - max
        June 2, 2016 4:35 pm
        
        Hi Erik,
        thanks for your reply.
        changing datatype to a production table is not an option.
        
        implicit conversion seems to be very common issue, and usually solved on the API side, by updating the latest jdbc driver. but that driver update didn’t solve it, and the problem persists with high cxpackets.
        thanks for the answer though