The Surprising Behavior of Trailing Spaces

In every training class we do, I learn stuff from students. Here’s last week’s surprise.

Create a temp table, and put in a value with trailing spaces. Then query for it using the = and <> operators, using different numbers of trailing spaces:

SQL Server simply ignores the trailing spaces:

When it comes to trailing spaces, SQL Server is the honey badger

The mind-blowing part to me is the <> operator – that seemed utterly crazy to me.

And if you add another table, and join to it, guess what happens:

All of the joins work regardless of the trailing space count:

SQL Server is a joiner

This behavior is documented in KB 316626. Turns out the K in KB really does stand for knowledge.

Previous Post
Does a TempDB spill mean statistics are out of date?
Next Post
Replacing ISNULL In A WHERE Clause

18 Comments. Leave new

  • Fancy!
    Checking the LEN of each row will result in 10 as well.
    SELECT *, len(Winery) FROM #Wines WHERE Winery = ‘Cliff Lede’;
    SELECT *, len(Winery) FROM #Wines WHERE Winery = ‘Cliff Lede ‘;

    Important to know, this behavior is only for trailing spaces, not leading.

    Reply
  • LEN will be the same while DATALENGTH will depend on the ANSI_PADDING setting.

    SET ANSI_PADDING ON;
    CREATE TABLE #Wines (Winery VARCHAR(50));
    INSERT INTO #Wines VALUES (‘Cliff Lede ‘);
    INSERT INTO #Wines VALUES (‘Cliff Lede’);
    SELECT DATALENGTH(Winery) AS [DataLength], LEN(Winery) AS [LEN], winery FROM #Wines WHERE Winery = ‘Cliff Lede’;
    DROP TABLE #Wines

    GO

    SET ANSI_PADDING OFF;
    CREATE TABLE #Wines (Winery VARCHAR(50));
    INSERT INTO #Wines VALUES (‘Cliff Lede ‘);
    INSERT INTO #Wines VALUES (‘Cliff Lede’);
    SELECT DATALENGTH(Winery) AS [DataLength], LEN(Winery) AS [LEN], winery FROM #Wines WHERE Winery = ‘Cliff Lede’;
    DROP TABLE #Wines

    It is important to know options because they could change the result. And two people running same code can get different results based on Query Options in SSMS.

    Reply
  • Gets even stranger with LIKE (mentioned in the KB, but in case someone doesn’t read it) – which side of the string has bonus whitespace matters:

    SELECT
    CASE WHEN
    ‘ThisString ‘ LIKE ‘ThisString’ THEN ‘True’ ELSE ‘False’ END — This will be true
    ,CASE WHEN
    ‘ThisString’ LIKE ‘ThisString ‘ THEN ‘True’ ELSE ‘False’ END — This will be false

    Reply
  • String comparisons are weird all over the place. In the xBase languages, the longer string is truncated to the length of the shorter string and then they are compared character by character. In SQL, we decided to had the shorter string with trailing spaces. Unfortunately, there’s this problem as to whether or not your strings or read right to left, left to right or have no concept of a space and whatever the language is. This is why tell everybody to stick to the smallest subset of Latin the actually need for their data, especially when designing encoding schemes.

    Reply
    • Mr. C – I’m not gonna lie, I cringed when I saw a comment come in from you because I thought, “OH NO, I GOT SOMETHING WRONG!” And when I finished your comment, I exhaled and said, “Thank goodness. I think I did OK.”

      😀 Have a good week, sir.

      Reply
  • Chris Camilleri
    April 13, 2017 11:25 am

    I thought that was crazy the first time i saw it too. This was my issue I sent them with nvarchar and varchar and trailing spaces for when you have the trailing spaces on left and right sides of a comparison operator.

    https://connect.microsoft.com/SQLServer/Feedback/Details/3104459

    Reply
  • Steve Mangiameli
    April 13, 2017 12:45 pm

    I haven’t tested it yet, but does the collation setting matter?

    Reply
    • I was wondering this same thing.

      I’ve seen this same behavior before. But, I never did have a chance to test whether collations made a difference.

      Reply
    • No the collation doesn’t appear to matter in terms of affecting this quirk. And vice versa, the case still factors into the comparison if a case sensitive collation is used, regardless of the quirk. Here’s a set of tests I put together:

      https://pastebin.com/bahwYcye

      The quirk only seems to happen with VARCHAR and NVARCHAR, whereas CHAR and NVARCHAR are not affected.

      Reply
  • Silviu Spataru
    April 14, 2017 5:17 am

    I remember some fun times when a client imported data from Excel, and it had hard spaces in it. Some columns become more equal than others.

    Reply
  • This caught me out while testing an SSIS lookup in T-SQL… Apart from having to handle the case sensitivity differences, the fact that SSIS doesn’t ignore trailing spaces but T-SQL does, meant that I resorted to using an MD5 hash on the joins between the source and the lookup to confirm SSIS’s behaviour.

    Reply
    • Steve, on the t-SQL side, you can use QUOTENAME(t1.column1) = QUOTENAME(t2.column1) to make both sides equal on the JOIN without having to incur the overhead of an MD5 hash.

      Reply
  • http://sqlfiddle.com/#!6/7b226/4
    Here’s a fun addition, it appears there’s a difference in behavoir when using like and trailing spaces when using nvarchar vs varchar.

    Reply
  • […] The Surprising Behavior of Trailing Spaces – Brent Ozar … […]

    Reply
  • […] The Surprising Behavior of Trailing Spaces – Brent Ozar … […]

    Reply
  • When you read a littler further into the ANSI standard, you get:

    “If the length in characters of X is not equal to the length
    in characters of Y, then the shorter string is effectively
    replaced, for the purposes of comparison, with a copy of
    itself that has been extended to the length of the longer
    string by concatenation on the right of one or more pad char-
    acters, where the pad character is chosen based on CS. If
    CS has the NO PAD attribute, then the pad character is an
    implementation-dependent character different from any char-
    acter in the character set of X and Y that collates less
    than any string under CS. Otherwise, the pad character is a
    .”

    Search me but I can’t find a way to obtain the “NO PAD” attribute for the collation affecting me, which is Latin1_General_CI_AS.

    I found MS documentation around a CREATE COLLATION statement – but I am really unsure of the context here: this does not appear to be documentation relating to T-SQL (see link 1). Regardless, the syntax does allow for keywords “NO PAD” and “PAD SPACE” although this clause is optional. I would guess that 1) “PAD SPACE” is default and 2) my above collation never bothered including that clause. The syntax seems to allow for creating new collations based on existing ones. I can see documentation for another RDBMS (MariaDB) recommending creating new sets of collations suffixed with “_NOPAD” to lead the DB to behave more like what we would expect, and yet another RDBMS (MySQL) supports identifying the space padding attribute of collations. The end of the MS article states, however:

    “Microsoft SQL Server 2008 R2 and Microsoft SQL Server 2012 vary as follows: Transact-SQL partially supports this feature. Transact-SQL supports a COLLATE clause, but does not support the CREATE COLLATION and DROP COLLATION statements.”

    I guess SQL Server 2016 follows suit likewise; entering “create collation” into a SQL session comes up with syntax error highlighting.

    So I guess in the end the MS doc is right to say SQL Server conforms with the standard, except to say that at least with this one collation, they have chosen not to implement the “NO PAD” variation of the standard. A quick check of available collations on my machine yields some 3,800+ options, but with there seemingly being no documentation around checking the “NO PAD” attribute mentioned by the standard, it might be a fair guess to suggest they simply haven’t implemented that attribute at all, opting instead for the default behaviour across all collations… except that link below which clearly describes the NO PAD clause and notes T-SQL does not implement the whole CREATE/DROP functionality.

    Navigating up the document tree yields:

    “[MS-TSQLISO02]: SQL Server Transact-SQL ISO/IEC 9075-2 Standards Support Document

    “Describes the level of support that is provided by Transact-SQL in both SQL Server 2008 R2 and SQL Server 2012 for Part 2: Foundation (SQL/Foundation) of the SQL language ISO/IEC 9075 international standard specification”

    … and on the way there we find:

    “2.1.2 Optional Features”

    So I guess MS is implementing ISO/IEC 9075-2 Standards, opted not to implement the optional padding attribute and meets the minimal functionality described for character string comparisons set out in ISO/IEC 9075:1992 – that being the default behaviour of padding with the space character.

    Damn. Was hoping we could raise a bug report.

    Does anyone know if any of the existing collations do implement the NO PAD attribute? (I doubt it.) If so, one workaround would be to specify that collation, as required. Otherwise, the best workaround will depend on what you’re actually doing – and I’m sure you’ve already worked that out.

    Link 1: https://msdn.microsoft.com/en-us/library/hh544656(v=sql.105).aspx

    Reply
  • Terry Grignon
    April 16, 2019 6:50 am

    Another interesting case with this ‘odd’ behaviour is how it is dealt with by the ‘right’ function. If I use something like this for an nvarchar field with a unique constraint:

    select strvalue, len(strvalue) as [Length], datalength(strvalue) as [DataLength] from tblwithtrailingspaces where right(strvalue,1) = N’ ‘;

    I get all the rows with trailing spaces as expected but I also get an empty string (N”) which I did not. So I had to add in this to the where clause to exclude it: datalength(strvalue) > 0.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}