T-SQL Has Regex in SQL Server 2025. Don’t Get Too Excited.
23 Comments
Regular expressions are a way of doing complex string searches. They can be really useful, but they have a reputation: they’re hard to write, hard to read, and they’re even harder to troubleshoot. Once you master ’em, though, they come in handy for very specific situations.
This post isn’t about their complexity, though. This post is about Azure SQL DB & SQL Server 2025’s regex performance.
In this case, regular expressions are like Everclear.
Everclear is a brand of alcohol, and when you read that, you might think, “Sounds interesting, I wonder what it tastes like.” It’s marketed as odorless, flavorless, and colorless. I say “marketed” because if you’ve ever been around it, you will not use terms like odorless or flavorless. To get a rough idea of what it’s like, watch The Unemployed Wine Guy try it.
See, Everclear is 95% pure alcohol by volume. It’s so potent that many states have outlawed it. You know Nevada, the state with gambling, weed, hookers, and assault weapons? Yeah, no Everclear allowed here. Gotta draw the line somewhere.
You’re not supposed to drink Everclear straight, but that trained professionals are supposed to use it – in very small quantities, in combination with other ingredients.

Let’s give regex a taste.
We’ll use the Stack Overflow 2010 database (the small one) in Azure SQL DB. The Users table on it has about 9M rows amounting to about 1GB in size.
Say we’re trying to find out if any sneaky users have been putting PII in places that it shouldn’t belong, like this week’s Query Exercise challenge. We want to find all Users.Location values where the user put in something resembling an email address. We’ll put an index on the Location column to give Azure SQL DB a fighting chance, and to give it statistics so it can better estimate how many rows might match a predicate, then run our query:
|
1 2 3 4 5 6 7 |
CREATE INDEX Location ON dbo.Users(Location); SET STATISTICS IO, TIME ON; SELECT TOP 100 * FROM dbo.Users WHERE REGEXP_LIKE(Location, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'); |
The resulting execution plan is really disappointing:

SQL Server could have scanned the Location index rather than the full table, but it chose not to.
Update 2025/12/02 – Good news! SQL Server 2025 RTM fixes half of the problem here. In the actual query plan, it still scans the table rather than touching the index, but at least the CPU usage is waaaay down.

I blogged more about SQL Server 2025’s implementation (and its remaining problems) here.
Why’s the performance taste so bad?
Reader: “It’s because you asked for SELECT *, and your index only has the Location column.” Not so fast – I only asked for 100 rows, which means that Azure SQL DB could have scanned the index to find the first 100 matching rows, and then done key lookups to fetch the columns, and it would have been faster. Let’s prove it:
|
1 2 3 4 5 6 7 |
SELECT TOP 100 * FROM dbo.Users WHERE REGEXP_LIKE(Location, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'); SELECT TOP 100 * FROM dbo.Users WITH (INDEX = Location) WHERE REGEXP_LIKE(Location, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'); |
And the abridged version of STATISTICS IO shows that the index would have been way, way less page reads:
|
1 2 3 4 5 |
Table 'Users'. Scan count 1, logical reads 157281 CPU time = 60500 ms, elapsed time = 60955 ms. Table 'Users'. Scan count 1, logical reads 20827 CPU time = 58546 ms, elapsed time = 58606 ms. |
And the query plan can indeed happily use the index. The top plan is the default, the bottom plan is with my index hint:

Reader: “So why is Azure SQL DB avoiding the index? Is the tipping point involved?” If Azure SQL DB believes a lot of rows will match your query’s predicate, then can veer towards clustered index usage rather than nonclustered indexes. However, look at those plans again: they can’t produce more than 100 rows! My query is specifically asking for the first 100 rows. There’s no sort. Just gimme the first 100 rows that match. I absolutely guarantee that this query could not produce more than 100 key lookups.
Reader: “Then why is Azure SQL DB avoiding the index then?” Well, when Microsoft coded the row estimates for regex, they made some unbelievably drunken decisions, perhaps involving chugging straight Everclear. Let’s run a different query to expose what’s happening:
|
1 2 3 |
SELECT COUNT(*) FROM dbo.Users WHERE REGEXP_LIKE(Location, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'); |
The number to pay attention to here is the estimated number of rows that will match. Hey, Azure SQL DB, how many rows in our Location column will match this regex search?

Azure SQL DB believes that 2,675,250 rows will match – when only 68 do. It’s off by a factor of 39,342X!
Performance is bad because Microsoft got lazy – again.
That 2,675,250 row estimate will be familiar to those of you who shelled out money for my Fundamentals of Query Tuning class. That number just happens to be exactly 30% of the row count of the Users table.
When Microsoft coded the row estimates for their fancy new regex functions – like many other kinds of search predicates – they simply hard-coded it and called it a day. No matter what your regexp_like query is looking for, Azure SQL DB guesses that 30% of the rows match. Sometimes, that estimate is fine – but most of the time, it’s not.
This is the kind of performance issue that doesn’t seem bad at small data sizes, but as your data grows, it –
wait a second
hang on
Our table only has 9 million rows, and it’s less than 1GB in size, and the regexp_like takes sixty seconds of CPU time!

That’s… terrifying. Sixty seconds of CPU time to query a 1GB table. That isn’t gonna scale, full stop. I mean, it’s fine for one-off utility queries on really small amounts of data, like if you’ve filtered the rowset down to a manageable number (like, 1,000 rows), but nothing you’d want to put into production on an OLTP system.
Plain ol’ queries are way better.
If you only need to find out if PII is getting out there, try a regular like query instead:
|
1 2 3 |
SELECT TOP 100 * FROM dbo.Users WHERE Location LIKE '%.com'; |
Reader: “Whoa Brent, you can’t start a like search with a leading wildcard! That’s bad!” Seriously, you gotta shut your pie hole every now and then, or at least go buy my Fundamentals classes, because you’re in for a big surprise. That query doesn’t have an index hint, and yet, Azure SQL DB is smart enough to use the index:

And it does less logical reads, and runs in just a couple seconds:
|
1 2 3 4 5 |
Table 'Users'. Scan count 1, logical reads 17734 Table 'Worktable'. Scan count 0, logical reads 0 SQL Server Execution Times: CPU time = 2359 ms, elapsed time = 2455 ms. |
Are the results the same? No – but it’s enough to solve the business problem and tell us that yes, people are indeed using email addresses in their Location columns. (I’m not putting those here on the blog, but you’re welcome to run the query yourself to see the results.)
That’s why I say regex is like Everclear.
Everclear isn’t inherently bad or evil – at least, not any more than gambling, weed, hookers, and assault weapons. (You hear me, Nevada?) You just have to be really careful when you use it, and understand what you’re in for. It shouldn’t be your first resort on a typical Saturday night – try other solutions like gin & tonic instead.
Similarly, regex in Microsoft databases can be fine, too – in combination with other ingredients. It shouldn’t be your first resort, and it certainly shouldn’t be the only thing in your where clause or joins. If you have a bad experience with it, your first resort should be to take it out, and try an alternative solution.
And whatever you do, don’t use regex and Everclear together.
Update 2025-07-05: Louis Davidson experimented with the new ASSUME_FIXED_MIN_SELECTIVITY_FOR_REGEXP and ASSUME_FIXED_MAX_SELECTIVITY_FOR_REGEXP query hints. It doesn’t improve the performance of queries like this blog post, but if you’re going to use REGEXP_LIKE in your code, they could be useful hints for tuning other queries.





If your company is hiring, leave a comment. The rules:


























