MAX Data Types Do WHAT?

Users are horrible, I know

You can just never rely on them to pass in reasonable search strings. When you write stored procedures to retrieve information for them, it’s really tempting to define all your string variables as MAX length. As long as the actual data type, N/VARCHAR matches, you’re cool, right? Avoid that pesky implicit conversion and live another day.

Well, not really. Let’s test it out. We’ll be using StackOverflow, and hitting the Users table. We have a column, DisplayName, that’s an NVARCHAR(40), which sets us up well enough to demo.

My Favorite Martian

There’s a user with the handle Eggs McLaren, which cracks me up. It reminds me of this old Red Stripe commercial. I use him for all my demos, even though he doesn’t really exist.

Well played, Jeff.

Well played, Jeff.

So what happens when we go looking for Eggs?

We get the correct results, but the execution plan has some bad news for us.

I prefer my queries unfiltered.

I prefer my queries unfiltered.

In short, we read everything in the clustered index. This could be mitigated with a smaller index, sure, but you’d still read all 5.2 million rows of it, and pass them into a filter operator. I’m using the clustered index here to highlight why this can be extra bad. We read and passed 22 GB of data into that filter operator, just to get one row out.

Why is this bad, and when is it different?

SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.

What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.

Even adjusting the variable type to NVARCHAR(4000) gets us there. If your users need to pass in search strings longer than 4000 characters, you have some serious thinking to do.

And a missing index request. What a peach.

And a missing index request. What a peach.

Rather than 22 GB, we’re looking at passing 4850 bytes to the next operator. This seems much more optimal to me. Next up is figuring out which columns we actually need, so we’re not running SELECT * every time.

That’s another bummer.

Thanks for reading!

, , , ,
Previous Post
PasteThePlan Update: See the Most Recently Pasted Plans
Next Post
Thoughts About the #SQLSummit Day 1 Keynote

22 Comments. Leave new

  • Hi Erik,

    10/10 post, would rep IRL.

    Can you explain or link us to some stuff that explains why the query engine is able to push the predicate down when switching to nvarchar(4000) but not with nvarchar(max)?

    P.S. do you know of any good gyms in Seattle? Preferably one without a lunk alarm. Asking for a friend.

    Reply
  • Very helpful post. I was in a meeting yesterday where a developer asked me the ramifications of using nvarchar(max) vs nvarchar(255) and I wasn’t sure so I told him I’d look it up and get back to him. I sent him a link to your post this morning! I usually learn something new every week from you guys.
    Thanks!

    Reply
  • Duane Lawrence
    October 20, 2016 12:15 pm

    Where is the rest of “MAX Data Types Do WHAT”? I have been trying to educate my developers that MAX anything runs slower. They will listen to Brent because they know him and respect his publications.

    Reply
  • >> If your users need to pass in search strings longer than 4000 characters, you have some serious thinking to do.

    But how about when you’re passing in to SSRS an “all items in dropdown” parameter? SSRS then passes in all items in a list. This could easily surpass 4000.

    Is there a way to change a SSRS “all” parameter to something that could be caught and then use dynamic SQL to leave off the column in the WHERE clause?

    Reply
    • Hi Pat,

      I couldn’t tell you which end of SSRS is up. I’ve never touched it, and hope I never have to.

      Use Tableau instead? 😀

      Thanks!

      Reply
    • Yes Pat, you could populate the dropdown with an “All” alternative at the very top and handle this in the T-SQL for the dataset eg. “… where (o.code = @p_code or @p_code = ‘ALL_CODES’).

      Reply
      • this is the correct solution.

        Reply
        • Hmmm, the SSRS parameter has an option for “allow multiple values”. Then the dropdown is automatically populated with a “select all” option. The dropdown appears as a checked listbox. I wonder if there’s a way to not allow multiple values and manually populate a select all to pass a chosen parameter, and still allow multiple values to be chosen?

          Reply
  • I assume the predicate push down will also get bypassed if the reverse happens, i.e. the column is an NVarChar(MAX) and the sting is NVarChar(40).

    Reply
  • Nice! Thanks!

    Reply
  • I tried to reproduce this and could reproduce a plan difference but not any actual performance difference when measured against dm_exec_requests with buffers cleared and plan cache cleared.

    Is there a script you can share that reproduces the dbo.Users object and associated indexes and data? I am happy to share my script too to see if we can get to the same point.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}