The Surprising Behavior of Trailing Spaces

T-SQL
28 Comments

In every training class we do, I learn stuff from students. Here’s last week’s surprise.

Create a temp table, and put in a value with trailing spaces. Then query for it using the = and <> operators, using different numbers of trailing spaces:

SQL Server simply ignores the trailing spaces:

When it comes to trailing spaces, SQL Server is the honey badger

The mind-blowing part to me is the <> operator – that seemed utterly crazy to me.

And if you add another table, and join to it, guess what happens:

All of the joins work regardless of the trailing space count:

SQL Server is a joiner

This behavior is documented in KB 316626. Turns out the K in KB really does stand for knowledge.

Previous Post
Does a TempDB spill mean statistics are out of date?
Next Post
Replacing ISNULL In A WHERE Clause

28 Comments. Leave new

  • Fancy!
    Checking the LEN of each row will result in 10 as well.
    SELECT *, len(Winery) FROM #Wines WHERE Winery = ‘Cliff Lede’;
    SELECT *, len(Winery) FROM #Wines WHERE Winery = ‘Cliff Lede ‘;

    Important to know, this behavior is only for trailing spaces, not leading.

    Reply
  • LEN will be the same while DATALENGTH will depend on the ANSI_PADDING setting.

    SET ANSI_PADDING ON;
    CREATE TABLE #Wines (Winery VARCHAR(50));
    INSERT INTO #Wines VALUES (‘Cliff Lede ‘);
    INSERT INTO #Wines VALUES (‘Cliff Lede’);
    SELECT DATALENGTH(Winery) AS [DataLength], LEN(Winery) AS [LEN], winery FROM #Wines WHERE Winery = ‘Cliff Lede’;
    DROP TABLE #Wines

    GO

    SET ANSI_PADDING OFF;
    CREATE TABLE #Wines (Winery VARCHAR(50));
    INSERT INTO #Wines VALUES (‘Cliff Lede ‘);
    INSERT INTO #Wines VALUES (‘Cliff Lede’);
    SELECT DATALENGTH(Winery) AS [DataLength], LEN(Winery) AS [LEN], winery FROM #Wines WHERE Winery = ‘Cliff Lede’;
    DROP TABLE #Wines

    It is important to know options because they could change the result. And two people running same code can get different results based on Query Options in SSMS.

    Reply
  • Gets even stranger with LIKE (mentioned in the KB, but in case someone doesn’t read it) – which side of the string has bonus whitespace matters:

    SELECT
    CASE WHEN
    ‘ThisString ‘ LIKE ‘ThisString’ THEN ‘True’ ELSE ‘False’ END — This will be true
    ,CASE WHEN
    ‘ThisString’ LIKE ‘ThisString ‘ THEN ‘True’ ELSE ‘False’ END — This will be false

    Reply
  • String comparisons are weird all over the place. In the xBase languages, the longer string is truncated to the length of the shorter string and then they are compared character by character. In SQL, we decided to had the shorter string with trailing spaces. Unfortunately, there’s this problem as to whether or not your strings or read right to left, left to right or have no concept of a space and whatever the language is. This is why tell everybody to stick to the smallest subset of Latin the actually need for their data, especially when designing encoding schemes.

    Reply
    • Mr. C – I’m not gonna lie, I cringed when I saw a comment come in from you because I thought, “OH NO, I GOT SOMETHING WRONG!” And when I finished your comment, I exhaled and said, “Thank goodness. I think I did OK.”

      😀 Have a good week, sir.

      Reply
  • Chris Camilleri
    April 13, 2017 11:25 am

    I thought that was crazy the first time i saw it too. This was my issue I sent them with nvarchar and varchar and trailing spaces for when you have the trailing spaces on left and right sides of a comparison operator.

    https://connect.microsoft.com/SQLServer/Feedback/Details/3104459

    Reply
  • Steve Mangiameli
    April 13, 2017 12:45 pm

    I haven’t tested it yet, but does the collation setting matter?

    Reply
    • I was wondering this same thing.

      I’ve seen this same behavior before. But, I never did have a chance to test whether collations made a difference.

      Reply
    • No the collation doesn’t appear to matter in terms of affecting this quirk. And vice versa, the case still factors into the comparison if a case sensitive collation is used, regardless of the quirk. Here’s a set of tests I put together:

      https://pastebin.com/bahwYcye

      The quirk only seems to happen with VARCHAR and NVARCHAR, whereas CHAR and NVARCHAR are not affected.

      Reply
  • Silviu Spataru
    April 14, 2017 5:17 am

    I remember some fun times when a client imported data from Excel, and it had hard spaces in it. Some columns become more equal than others.

    Reply
  • This caught me out while testing an SSIS lookup in T-SQL… Apart from having to handle the case sensitivity differences, the fact that SSIS doesn’t ignore trailing spaces but T-SQL does, meant that I resorted to using an MD5 hash on the joins between the source and the lookup to confirm SSIS’s behaviour.

    Reply
    • Steve, on the t-SQL side, you can use QUOTENAME(t1.column1) = QUOTENAME(t2.column1) to make both sides equal on the JOIN without having to incur the overhead of an MD5 hash.

      Reply
  • http://sqlfiddle.com/#!6/7b226/4
    Here’s a fun addition, it appears there’s a difference in behavoir when using like and trailing spaces when using nvarchar vs varchar.

    Reply
  • […] The Surprising Behavior of Trailing Spaces – Brent Ozar … […]

    Reply
  • […] The Surprising Behavior of Trailing Spaces – Brent Ozar … […]

    Reply
  • When you read a littler further into the ANSI standard, you get:

    “If the length in characters of X is not equal to the length
    in characters of Y, then the shorter string is effectively
    replaced, for the purposes of comparison, with a copy of
    itself that has been extended to the length of the longer
    string by concatenation on the right of one or more pad char-
    acters, where the pad character is chosen based on CS. If
    CS has the NO PAD attribute, then the pad character is an
    implementation-dependent character different from any char-
    acter in the character set of X and Y that collates less
    than any string under CS. Otherwise, the pad character is a
    .”

    Search me but I can’t find a way to obtain the “NO PAD” attribute for the collation affecting me, which is Latin1_General_CI_AS.

    I found MS documentation around a CREATE COLLATION statement – but I am really unsure of the context here: this does not appear to be documentation relating to T-SQL (see link 1). Regardless, the syntax does allow for keywords “NO PAD” and “PAD SPACE” although this clause is optional. I would guess that 1) “PAD SPACE” is default and 2) my above collation never bothered including that clause. The syntax seems to allow for creating new collations based on existing ones. I can see documentation for another RDBMS (MariaDB) recommending creating new sets of collations suffixed with “_NOPAD” to lead the DB to behave more like what we would expect, and yet another RDBMS (MySQL) supports identifying the space padding attribute of collations. The end of the MS article states, however:

    “Microsoft SQL Server 2008 R2 and Microsoft SQL Server 2012 vary as follows: Transact-SQL partially supports this feature. Transact-SQL supports a COLLATE clause, but does not support the CREATE COLLATION and DROP COLLATION statements.”

    I guess SQL Server 2016 follows suit likewise; entering “create collation” into a SQL session comes up with syntax error highlighting.

    So I guess in the end the MS doc is right to say SQL Server conforms with the standard, except to say that at least with this one collation, they have chosen not to implement the “NO PAD” variation of the standard. A quick check of available collations on my machine yields some 3,800+ options, but with there seemingly being no documentation around checking the “NO PAD” attribute mentioned by the standard, it might be a fair guess to suggest they simply haven’t implemented that attribute at all, opting instead for the default behaviour across all collations… except that link below which clearly describes the NO PAD clause and notes T-SQL does not implement the whole CREATE/DROP functionality.

    Navigating up the document tree yields:

    “[MS-TSQLISO02]: SQL Server Transact-SQL ISO/IEC 9075-2 Standards Support Document

    “Describes the level of support that is provided by Transact-SQL in both SQL Server 2008 R2 and SQL Server 2012 for Part 2: Foundation (SQL/Foundation) of the SQL language ISO/IEC 9075 international standard specification”

    … and on the way there we find:

    “2.1.2 Optional Features”

    So I guess MS is implementing ISO/IEC 9075-2 Standards, opted not to implement the optional padding attribute and meets the minimal functionality described for character string comparisons set out in ISO/IEC 9075:1992 – that being the default behaviour of padding with the space character.

    Damn. Was hoping we could raise a bug report.

    Does anyone know if any of the existing collations do implement the NO PAD attribute? (I doubt it.) If so, one workaround would be to specify that collation, as required. Otherwise, the best workaround will depend on what you’re actually doing – and I’m sure you’ve already worked that out.

    Link 1: https://msdn.microsoft.com/en-us/library/hh544656(v=sql.105).aspx

    Reply
  • Terry Grignon
    April 16, 2019 6:50 am

    Another interesting case with this ‘odd’ behaviour is how it is dealt with by the ‘right’ function. If I use something like this for an nvarchar field with a unique constraint:

    select strvalue, len(strvalue) as [Length], datalength(strvalue) as [DataLength] from tblwithtrailingspaces where right(strvalue,1) = N’ ‘;

    I get all the rows with trailing spaces as expected but I also get an empty string (N”) which I did not. So I had to add in this to the where clause to exclude it: datalength(strvalue) > 0.

    Reply
  • Lawrence Barnes
    October 13, 2020 7:35 am

    It looks like this also applies to trailing spaces with char()
    IF OBJECT_ID(‘tempdb..#tmpdirtychar’) IS NOT NULL DROP TABLE #tmpdirtychar;
    IF OBJECT_ID(‘tempdb..#tmpcleanchar’) IS NOT NULL DROP TABLE #tmpcleanchar;
    Create Table #tmpcleanchar (val1 Char(5), val2 int)
    Create Table #tmpdirtychar (val1 Char(5), val2 int)
    Insert into #tmpcleanchar(val1, val2)
    Values(‘1’,1),(‘2’,2),(‘3’,3),(‘4’,4)
    Insert into #tmpdirtychar(val1, val2)
    Values(‘1 ‘,10),(‘2 ‘,20),(‘ 3’,30),(‘ 4 ‘,40)
    Select ‘clean’, t.* From #tmpcleanchar t
    union
    Select ‘dirty’, t.* From #tmpdirtychar t

    Select tc.*, td.*
    From #tmpcleanchar tc
    Inner Join #tmpdirtychar td On td.val1 = tc.val1

    Reply
  • ansi padding is set to on on a column and still it does not show diff between text with trailing spaces to text without

    is there a solution for that?

    Reply
  • Kev Pickering
    June 3, 2021 8:23 am

    Currently importing data from an external source using SSIS . Trailing spaces found in VARCHAR column used as a primary key. Doesn’t matter what ansi_padding is set when creating the destination table, primary key violation scuppers the import. We have no control over the external source.
    Guess the only option is to use HASHBYTES and use that as the primary key, which does work.

    Reply
  • Randy in Marin
    August 23, 2021 3:49 pm

    We used to import raw mainframe data into a SQL server every day for reports. One of the expenditure codes had a significant trailing space. If Hell has an IT department, they likely have a tag line “Trailing space is significant!”.

    Reply
  • It gets really crazy when you shift over to SSIS.

    SSIS lookups _are_ sensitive to trailing spaces (I haven’t tested with leading spaces). In other words, “A” in the main dataflow won’t match to “A ” in the lookup.

    But…. The Slowly Changing Dimension component – which you might think was basically a Lookup with fancy bits – _isn’t_ sensitive to trailing spaces. Pipe some input like “A”, “B”, “C” into it, lookup in a dimension with business key “A “, “B “, “C “, and it’ll treat them as matched: i.e. they won’t go to the New output, but to one of the others (depending on changed/identical attribute values).

    A good thing, you might think. But it all goes wrong when a fact table has an SSIS lookup on a dimension table – into which trailing spaces have somehow crept in. If the dimension load itself uses a SCD component, you’ll never get unexpected behaviour from that load – because the correct, spaceless input matches to the incorrect, trailing-spaced dimension.

    Reply
  • SQL LIKE (without wildcard, yes… i know it’s bad):

    — nvarchar
    select 1 where N’a ‘ like N’a’ — returns 1
    select 1 where N’a’ like N’a ‘ — no return (right side trailing)
    select 1 where N’a ‘ like N’a’ — no return (left side trailing)

    — varchar
    select 1 where ‘a’ like ‘a’ — returns 1 (no trailing)
    select 1 where ‘a’ like ‘a ‘ — no return (right side trailing)
    select 1 where ‘a ‘ like ‘a’ — returns 1(left side trailing) <—- WHY RETURNS 1??

    Reply
  • Cornan The Iowan
    November 6, 2023 3:19 pm

    My use case this week was with SSRS 2016, which IS sensitive to trailing spaces (I called RTRIM() to handle it), and this led me to wonder about foreign keys, not yet mentioned in this thread.

    Foreign keys also accept trailing spaces, so you can store the data with values that differ in trailing spaces.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.