T-SQL Has Regex in SQL Server 2025. Don’t Get Too Excited.

SQL Server 2025, T-SQL
23 Comments

Regular expressions are a way of doing complex string searches. They can be really useful, but they have a reputation: they’re hard to write, hard to read, and they’re even harder to troubleshoot. Once you master ’em, though, they come in handy for very specific situations.

This post isn’t about their complexity, though. This post is about Azure SQL DB & SQL Server 2025’s regex performance.

In this case, regular expressions are like Everclear.

Everclear is a brand of alcohol, and when you read that, you might think, “Sounds interesting, I wonder what it tastes like.” It’s marketed as odorless, flavorless, and colorless. I say “marketed” because if you’ve ever been around it, you will not use terms like odorless or flavorless. To get a rough idea of what it’s like, watch The Unemployed Wine Guy try it.

See, Everclear is 95% pure alcohol by volume. It’s so potent that many states have outlawed it. You know Nevada, the state with gambling, weed, hookers, and assault weapons? Yeah, no Everclear allowed here. Gotta draw the line somewhere.

You’re not supposed to drink Everclear straight, but that trained professionals are supposed to use it – in very small quantities, in combination with other ingredients.

This? It's just coffee, as far as you need to know.

Let’s give regex a taste.

We’ll use the Stack Overflow 2010 database (the small one) in Azure SQL DB. The Users table on it has about 9M rows amounting to about 1GB in size.

Say we’re trying to find out if any sneaky users have been putting PII in places that it shouldn’t belong, like this week’s Query Exercise challenge. We want to find all Users.Location values where the user put in something resembling an email address. We’ll put an index on the Location column to give Azure SQL DB a fighting chance, and to give it statistics so it can better estimate how many rows might match a predicate, then run our query:

The resulting execution plan is really disappointing:

Regex query plan doing clustered index scan

SQL Server could have scanned the Location index rather than the full table, but it chose not to.

Update 2025/12/02 – Good news! SQL Server 2025 RTM fixes half of the problem here. In the actual query plan, it still scans the table rather than touching the index, but at least the CPU usage is waaaay down.
SQL Server 2025 RTM

I blogged more about SQL Server 2025’s implementation (and its remaining problems) here.

Why’s the performance taste so bad?

Reader: “It’s because you asked for SELECT *, and your index only has the Location column.” Not so fast – I only asked for 100 rows, which means that Azure SQL DB could have scanned the index to find the first 100 matching rows, and then done key lookups to fetch the columns, and it would have been faster. Let’s prove it:

And the abridged version of STATISTICS IO shows that the index would have been way, way less page reads:

And the query plan can indeed happily use the index. The top plan is the default, the bottom plan is with my index hint:

Default plan on top, index hint on the bottom

Reader: “So why is Azure SQL DB avoiding the index? Is the tipping point involved?” If Azure SQL DB believes a lot of rows will match your query’s predicate, then can veer towards clustered index usage rather than nonclustered indexes. However, look at those plans again: they can’t produce more than 100 rows! My query is specifically asking for the first 100 rows. There’s no sort. Just gimme the first 100 rows that match. I absolutely guarantee that this query could not produce more than 100 key lookups.

Reader: “Then why is Azure SQL DB avoiding the index then?” Well, when Microsoft coded the row estimates for regex, they made some unbelievably drunken decisions, perhaps involving chugging straight Everclear. Let’s run a different query to expose what’s happening:

The number to pay attention to here is the estimated number of rows that will match. Hey, Azure SQL DB, how many rows in our Location column will match this regex search?

Bad estimate

Azure SQL DB believes that 2,675,250 rows will match – when only 68 do. It’s off by a factor of 39,342X!

Performance is bad because Microsoft got lazy – again.

That 2,675,250 row estimate will be familiar to those of you who shelled out money for my Fundamentals of Query Tuning class. That number just happens to be exactly 30% of the row count of the Users table.

When Microsoft coded the row estimates for their fancy new regex functions – like many other kinds of search predicates – they simply hard-coded it and called it a day. No matter what your regexp_like query is looking for, Azure SQL DB guesses that 30% of the rows match. Sometimes, that estimate is fine – but most of the time, it’s not.

This is the kind of performance issue that doesn’t seem bad at small data sizes, but as your data grows, it –

wait a second

hang on

Our table only has 9 million rows, and it’s less than 1GB in size, and the regexp_like takes sixty seconds of CPU time!

High CPU time

That’s… terrifying. Sixty seconds of CPU time to query a 1GB table. That isn’t gonna scale, full stop. I mean, it’s fine for one-off utility queries on really small amounts of data, like if you’ve filtered the rowset down to a manageable number (like, 1,000 rows), but nothing you’d want to put into production on an OLTP system.

Plain ol’ queries are way better.

If you only need to find out if PII is getting out there, try a regular like query instead:

Reader: “Whoa Brent, you can’t start a like search with a leading wildcard! That’s bad!” Seriously, you gotta shut your pie hole every now and then, or at least go buy my Fundamentals classes, because you’re in for a big surprise. That query doesn’t have an index hint, and yet, Azure SQL DB is smart enough to use the index:

LIKE query plan uses the index

And it does less logical reads, and runs in just a couple seconds:

Are the results the same? No – but it’s enough to solve the business problem and tell us that yes, people are indeed using email addresses in their Location columns. (I’m not putting those here on the blog, but you’re welcome to run the query yourself to see the results.)

That’s why I say regex is like Everclear.

Everclear isn’t inherently bad or evil – at least, not any more than gambling, weed, hookers, and assault weapons. (You hear me, Nevada?) You just have to be really careful when you use it, and understand what you’re in for. It shouldn’t be your first resort on a typical Saturday night – try other solutions like gin & tonic instead.

Similarly, regex in Microsoft databases can be fine, too – in combination with other ingredients. It shouldn’t be your first resort, and it certainly shouldn’t be the only thing in your where clause or joins. If you have a bad experience with it, your first resort should be to take it out, and try an alternative solution.

And whatever you do, don’t use regex and Everclear together.

Update 2025-07-05: Louis Davidson experimented with the new ASSUME_FIXED_MIN_SELECTIVITY_FOR_REGEXP and ASSUME_FIXED_MAX_SELECTIVITY_FOR_REGEXP query hints. It doesn’t improve the performance of queries like this blog post, but if you’re going to use REGEXP_LIKE in your code, they could be useful hints for tuning other queries.

Previous Post
Query Exercise: Looking for Email Addresses
Next Post
[Video] Office Hours: Waiting for the Hot Tub to Fill

23 Comments. Leave new

  • Just want to say I love the analogy here to Everclear (and as always the quality of your posts)

    Reply
  • Ah never heard of Everclear but had heard that some Polish vodka’s can be quite strong… “ Polmos Spirytus Rektyfikowany”. Shame Microsoft didn’t take the ^ to work out it was at the start of the string and this do an index seek.

    Reply
  • Alan Cranfield
    March 19, 2025 4:17 pm

    Everclear is great fun at a firepit… Take a mouthful and blow it on the flames.. ! Awesome party trick and smells nicer than gasoline. the Everclear and T-SQL regex analogy resonates. (watch your hair and eyebrows)

    Reply
  • Reminds me of the old joke: I had a problem which I attempted to solve with regex. Now I have two problems.

    Reply
  • A friend tells me that back in the early 80’s, when the legal drinking age was 18, a beverage preferred by high schoolers at Senior Ditch Day, post-Prom parties, beach weekends, etc. was a 2-liter bottle of Sprite with a pint of Everclear / grain alcohol poured into it. Lemon-lime fresh and about 45 proof.

    Reply
  • While REGEX baked in SQL Server will have it’s uses for say data analysis, I shudder when I think about some silly dev using it in production application code queries… because… well you know it’s there.

    Just write a nice C# (my flavor, yours may vary) application to read the table into memory and use Regex there. That would be a lot faster (Threading is a lot easier in most programming languages) and frees up the SQL Server resources for more… important matters such as querying data.

    Reply
  • Thanks for the writeup and testing Brent. Unfortunately there are some special cases when RegEx is way more comfortable to use like using lookaheads and lookbehinds or just keeping whitelisted characters in a string (for example remove “,” when you are doing a csv so that cannot lead to confusion).

    That said I have done regex within TSQL for over a decade using CLR functions. It works but well isn’t perfect and the CLR get’s called for every row separately. I would have to test but still hope that here the integrated regex will show better performance in most cases that the CLR approach despite the hard coded estimate.

    Reply
  • […] T-SQL Has Regex. Don’t Get Too Excited. (Brent Ozar) […]

    Reply
  • Great post!

    Even LIKE() with basic wildcards has it challenges, similarly using a function in WHERE

    its great that there’s now regexp support in T-SQL, but seems like its best applied on
    – a very small number of rows if used directly
    – used as a second filter on some results once the main query predicates have narrowed the dataset down several orders of magnitude

    Reply
  • Could be more useful if a regexp could be applied as an index or maybe a view
    – clearly relies on there being no demand for queries coming in with ad-hoc regexps

    an index using a regexps could perhaps filter out “bad” data
    – phone numbers, zip codes, email addresses etc. that are just syntactically incorrect
    – free text fields could be scanned for potential PII patterns

    Reply
  • I downloaded the public preview of SQL Server 2025 and had a play.

    I tried

    SELECT SUM(REGEXP_COUNT(Location, ‘United Kingdom’)), SUM(DATALENGTH(Location))
    FROM dbo.Users
    WHERE Location IS NOT NULL

    CPU time was 7203 ms for 130205 rows. (Final result 9768)

    Looks like there must be some huge “per row” overhead (maybe recompiling the regex from scratch each time)

    as running

    SELECT REGEXP_COUNT(STRING_AGG(CAST(Location AS NVARCHAR(MAX)), ”), ‘United Kingdom’)
    FROM dbo.Users

    was ~400 ms

    (Just to get an indicative idea of whether it was the quantity of text or number of invocations that is the primary problem)

    I hope they improve this before GA!

    Reply
  • Two new USE HINTs are available
    ASSUME_FIXED_MIN_SELECTIVITY_FOR_REGEXP
    ASSUME_FIXED_MAX_SELECTIVITY_FOR_REGEXP

    looks like there are *three* possible estimations: the default plus those two.

    Reply
  • I’d love to meet the person who gets the patent on heuristics for estimating number of rows for a regular expression.

    Reply
    • I was genuinely surprised that the estimate didn’t simply decrease as the number of filters passed into the regular expression increased. I would guess that the more filters you add, the less data is likely to match.

      That isn’t necessarily true either – for example, if you’re filtering for email addresses in a column that exclusively has email addresses – but it feels much more accurate than a hard-coded 30% estimate. The whole point of using regex is for complex filtering. If 30% of the data matched, you probably wouldn’t need a complex filter to begin with!

      Reply
      • Not so suprised given there’s very little thats done with LIKE and the options for filtering are way way less than a regex.
        Maybe there are optimisations they can implement over time, as they have done with LIKE.

        In summary
        * the estimate is fixed which is good and bad, and they could probably do more, but if you’re putting that in the middle of a large table query you aren’t looking for high perf.
        * the performance of the actual function isn’t bad. LIKE beats it for a simple LIKE but LIKE gets beaten if it gets complex. So this isn’t a FORMAT function scenario, the perf of the function isn’t bad.

        ps.Looks like the estimate dropped to 9% since you tested.

        Reply
  • […] you’re into regular expressions, they’re now built in, so you don’t have to install third party tools for it anymore. However, I’d caution […]

    Reply
  • Erland Sommarskog
    December 2, 2025 9:50 pm

    Brent, I think you should redo the tests on SQL 2025 RTM (not in Azure as of now, the final bits are not in place there yet). They have done quite a bit of work with performance. One issue they had was a large overhead when passing the data to RE2, the external library that they use. This has largely been addressed. On the other hand, they are no longer trying to use indexes. Indexes are organised according to a collation, but regular expressions are essentially binary. They try to add extra predicates to permit use of indexes, but with some collations that did not work out well.

    Reply
    • Erland – woohoo, that’s great to hear! I updated the post with the SQL Server 2025 RTM plan, which does indeed use waaaaay less CPU time, thank goodness. The estimates and lack of index usage are still a problem, but still, this is way better than it was! Thanks for the heads up.

      Reply
  • […] when Microsoft first announced that REGEX support was coming to SQL Server 2025 and Azure SQL DB, I gave it a quick test, and the performance was horrific. It was bad in 3 different […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.