Why Full Text’s CONTAINS Queries Are So Slow

Last Updated November 30, 2020

SQL Server’s full text search is amazing. Well, it amazes me at least – it has so many cool capabilities: looking for prefixes, words near each other, different verb tenses, and even thesaurus searches. However, that’s not how I see most people using it: I’ve seen so many shops using it for matching specific strings, thinking it’s going to be faster than LIKE ‘%mysearch%’. That works at small scale, but as your data grows, you run into a query plan performance problem.

When your query uses CONTAINS, SQL Server has a nasty habit of doing a full text search across all of the rows in the table rather than using the rest of your WHERE clause to reduce the result set first.

This isn’t a big deal for small text data sets, like under 10GB, but as your data grows linearly, your query time goes up linearly too. By the time your data grows to real production sizes, full text can get uncomfortably painful.

To illustrate it, let’s say I’m using the Stack Overflow database, and I want to optimize this query:

SELECT TOP 100 *
	FROM dbo.Posts
	WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'
	AND Title LIKE '%java%'
	ORDER BY CreationDate;

SELECT TOP 100 *

FROM dbo.Posts

WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'

AND Title LIKE '%java%'

ORDER BY CreationDate;

The query runs in milliseconds if I have an index on CreationDate and Title:

CREATE INDEX CreationDate ON dbo.Posts(CreationDate) INCLUDE (Title);

1	CREATE INDEX CreationDate ON dbo.Posts(CreationDate) INCLUDE (Title);

Because SQL Server chooses to seek to my specific date first, then check the posts’ titles, as the actual query plan shows:

That’s awesome! Nobody’s gonna complain about that speedy query. If you hover your mouse over the Index Seek, it had to read 4,510 rows to find 100 rows that had “SQL” in the title – but that’s not terrible. It only did 100 key lookups, too. It’s super-quick:

Table 'Posts'. Scan count 1, logical reads 465
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 181 ms.

Table 'Posts'. Scan count 1, logical reads 465

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 181 ms.

However, try the same thing with full text.

I’m going to set up full text search and create a full text index on the Title column:

CREATE FULLTEXT CATALOG ft AS DEFAULT;  
GO
CREATE FULLTEXT INDEX ON dbo.Posts(Title)
   KEY INDEX PK_Posts__Id
   WITH STOPLIST = SYSTEM;  
GO

CREATE FULLTEXT CATALOG ft AS DEFAULT;

CREATE FULLTEXT INDEX ON dbo.Posts(Title)

KEY INDEX PK_Posts__Id

WITH STOPLIST = SYSTEM;

If you try that yourself, be aware that the above query will finish instantly, but it’s kicking off a full text index population in the background. That takes way, way longer to complete than the nonclustered index did, so while you wait, here’s how to check the status of your full text index populations.

When the full text population finishes, try the full text version of the query with CONTAINS:

SELECT TOP 100 *
	FROM dbo.Posts
	WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'
	AND CONTAINS(Title, '"java"')
	ORDER BY CreationDate;

SELECT TOP 100 *

FROM dbo.Posts

WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'

AND CONTAINS(Title, '"java"')

ORDER BY CreationDate;

And the query is slower because it does more work:

Table 'Worktable'. Scan count 8874, logical reads 1037897
Table 'Posts'. Scan count 1, logical reads 36139
 SQL Server Execution Times:
   CPU time = 609 ms,  elapsed time = 814 ms.

Table 'Worktable'. Scan count 8874, logical reads 1037897

Table 'Posts'. Scan count 1, logical reads 36139

SQL Server Execution Times:

CPU time = 609 ms, elapsed time = 814 ms.

Because of the different plan:

The plan operators, top to bottom:

SQL Server does an index seek on CreationDate, but it reads more rows
It does 8,874 key lookups – up from just 100
The full text search finds all posts with java in the title, regardless of date – producing 440,615 rows

Complex searches are even worse.

Let’s throw in a simple “and”, looking for both java and sql:

SELECT TOP 100 *
	FROM dbo.Posts
	WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'
	AND CONTAINS(Title, 'java and sql')
	ORDER BY CreationDate;

SELECT TOP 100 *

FROM dbo.Posts

WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'

AND CONTAINS(Title, 'java and sql')

ORDER BY CreationDate;

The query takes over 20 seconds thanks to an imbalance of work across cores:

Table 'Worktable'. Scan count 778381, logical reads 1573682
Table 'Posts'. Scan count 5, logical reads 3168118
 SQL Server Execution Times:
   CPU time = 6938 ms,  elapsed time = 21032 ms.

Table 'Worktable'. Scan count 778381, logical reads 1573682

Table 'Posts'. Scan count 5, logical reads 3168118

SQL Server Execution Times:

CPU time = 6938 ms, elapsed time = 21032 ms.

And the actual plan shows the inefficiencies of work when we’re only trying to get 100 rows:

Why does SQL Server do it this way? I dunno – it just always has. It finds all the data that matches, and then joins to the rest of the data you’re searching for. This isn’t a new problem – it’s been the same way for over a decade – and it’s one of the reasons Stack Overflow had to move their search out of SQL Server as well.

Just for laughs, if I use LIKEs with leading % signs:

SELECT TOP 100 *
	FROM dbo.Posts
	WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'
	AND Title LIKE '%java%'
	AND Title LIKE '%sql%'
	ORDER BY CreationDate

SELECT TOP 100 *

FROM dbo.Posts

WHERE CreationDate BETWEEN '2013-01-01' AND '2013-12-31'

AND Title LIKE '%java%'

AND Title LIKE '%sql%'

ORDER BY CreationDate

That finishes in less than a second, with 2,426 logical reads. Here’s the actual plan.

Full text search doesn’t integrate well in query plans.

If you have a really simple query and you’re filtering for very unusual keywords, then full text search is pretty dang good. It brings powerful capabilities to SQL Server built right into the box.

However, the more complex your query plan becomes, like the more filtering that’s being done on other tables, AND the more common your search keywords are, the angrier you’ll become with full text performance. SQL Server will choose to perform the full text search across all of the rows regardless of your other filters, then drag a ton of data back into the execution plan and force the rest of the operators to deal with it.

In this particular example, a like with a leading % sign – normally a DBA’s enemy – is actually a performant solution. I wanna stress, though, that that isn’t a solution that works in the majority of shops.

In other situations, I’ve had to point folks toward:

Building a terms table – and have people check boxes to pick from common terms, or have your app examine their search string to redirect over to the terms table rather than use full text search
Paul White’s trigram search solution – which requires you to build a separate table, keep it in sync, and rewrite your app’s searches to point at it
Elasticsearch – but then it’s left up to you to keep SQL Server and Elasticsearch’s contents in sync, and to rewrite your app’s searches to point at Elastic rather than SQL Server
Scaling out with Availability Group readable replicas – which costs a fortune, since they have to be Enterprise Edition, but at least you can scale the search workloads out without changing your application, or it can buy you time while you rewrite the search part of your app

None of those solutions are easy or cheap.

Free Webcast: Why is the Same Query Sometimes Slow?

MAXDOP Isn’t Really MAXDOP. It’s More Like DOP.

33 Comments. Leave new

David Tovee
November 7, 2020 9:39 am

Thanks for this, I’ve been waiting for it since you said you were going to blog about it.
We are upgrading IBM Maximo to 7.6 switching from Oracle to MS SQL Server and the DB has been configured using the full-text search.
I only noticed it as they were leaving the indexes in place while they are transferring the data on the ‘workorder’ table which is very wide and has lots of rows 🙂 (the table is stupid wide with text fields and is around 20 million rows). I created a couple of scripts to delete and recreate the indexes but they failed due to the full-text indexes. So I rewrote them to disable and then rebuild but I think they altered the transfer process to create the indexes after the inserts (the inserts seems like the are encapsulated in a transaction with one row at a time).
Thanks again but it also makes me feel depressed about the performance.

Reply
- Brent Ozar
  November 7, 2020 9:56 am
  
  You’re welcome!
  
  Reply
Luke
November 7, 2020 10:17 am

Great link to Paul White trigrams: now we know why in many autocomplete UI starts to suggest only from 3 chars….

Reply
Alex Friedman
November 8, 2020 9:13 am

Another vote for Elasticsearch here 🙂 These days it’s really about using the right tool for the job — while not dying maintaining too many tools.

Reply
- Jon Morisi
  November 18, 2020 8:33 am
  
  I agree, Enterprise Search Software. Unfortunately that means not SQL Server. I was musing back in 2016 after learning about Azure DataWarehouse and it’s hybrid integration with Hadoop whether or not that meant some version of Lucidworks was coming to SQL Server, I guess not.
  
  Reply
Shaun Austin
November 19, 2020 7:04 am

Anyone got any best practice guides / examples for sync’ing SQL Server tables with ElasticSearch? I’ll be shortly tacking that problem and would love to know how people went about it and what problems they faced.

Reply
- Brent Ozar
  November 19, 2020 7:49 am
  
  Your best bet there would be to ask at a Q&A site like https://dba.stackexchange.com.
  
  Reply
Tamas
March 5, 2021 11:19 pm

Would Azure Cognitive Search be a good alternative?

Reply
- Brent Ozar
  March 6, 2021 6:54 am
  
  It might be – how would you propose to keep that data in sync with the SQL Server’s contents?
  
  Reply
  - Kevin Martin
    March 24, 2021 3:47 am
    
    We use Azure Cognitive Search (Azure Search Services) and setup the indexes to refresh between 5 minutes – 24 hours via the Azure portal. The data sources can be Azure SQL Databases & Managed Instances tables or views (+many more). It is super important to use rowversion/timestamp as the high-water mark or you will get stale results.
    
    We don’t use any of the AI enrichment that comes with Cognitive Search. Our software devs use APIs on web app search pages to return results.
    
    We used to provision a small Azure virtual machine just for ElasticSearch but Azure Cognitive Search (Azure Search Services) does the trick for our clients.
    
    Mostly everything we develop (web app and database) for our clients are deployed in Azure. https://www.emergentsoftware.net/work has a representative list of client web apps that use Azure Search (some, not all).
    
    Reply
    - Brent Ozar
      March 24, 2021 5:05 am
      
      Gotcha: so just to rephrase, you need to change your tables to include a rowversion or timestamp, and then it’s automatic, but the data’s out of date by 5 minutes or more. That’s a fairly significant dealbreaker for a lot of companies, but that’s a neat fit for some!
      
      Reply
Greg Menounos
May 10, 2021 10:15 pm

If you know that the full text portion of the query is going to be much more selective than the non-full-text portion, you can use CONTAINSTABLE and a FORCE ORDER hint, resulting in a query that runs a lot faster. In the version of the StackOverflow DB that I downloaded, the CONTAINSTABLE portion of the join returns 1896 rows and the date-filtered portion of the Posts table is 5.4 million rows. The query executes in about a tenth of a second on my machine with an empty cache.

SELECT TOP 100 p.*
FROM CONTAINSTABLE(dbo.Posts, Title, ‘java and sql’) ct
INNER JOIN dbo.Posts p ON ct.[KEY] = p.id
WHERE p.CreationDate BETWEEN ‘2013-01-01’ AND ‘2013-12-31’
ORDER BY p.CreationDate
OPTION (FORCE ORDER);

Reply
- Brent Ozar
  May 11, 2021 5:20 am
  
  Greg – neat idea, but that’s pretty tough to recommend for production code for so many reasons:
  
  * You have to know the selectivity of what you’re looking for
  * You have to be able to build different queries (with/without the hint) when your parameters change
  * You have to be able to dramatically change the query (doubling the number of references to the Posts table) – this doesn’t work well with tools like EF or NHibernate
  * You have to be positive that FORCE ORDER will work – its performance often falls apart catastrophically on queries with multiple joins
  
  Neat idea though!
  
  Reply
Adam Everington
May 18, 2021 8:25 am

We are looking for a full-text based solution for a new app, usually we’re a SQL Server house but stacked up against other products it just doesn’t seem to cut the mustard.
Currently looking at:
– ElasticSearch (based on your recommendation above)
– Postgres: full-text searches in postgres seem way faster on a like-for-like search across the same data
– Combination of the above using ZomboDb

Reply
Thomas Petri
June 27, 2022 2:40 pm

Hate to be nitpicky about typos (That’s not true, I actually love and enjoy pointing out these mistakes), but in the first script you filtered by titles containing “java”

SELECT TOP 100 *
FROM dbo.Posts
WHERE CreationDate BETWEEN ‘2013-01-01’ AND ‘2013-12-31’
AND Title LIKE ‘%java%’
ORDER BY CreationDate;

But in the next passage you wrote
“That’s awesome! […] to find 100 rows that had “SQL” in the title […]”
And the pasted query plan also filtered for ‘%SQL%’

That’s all, that’s my comment 🙂
Love what you’re doing here btw, your blog posts and videos/streams are super informative and entertaining at the same time.

Reply
- Brent Ozar
  June 28, 2022 3:38 pm
  
  Thanks, glad you like my work!
  
  Reply
Farhad
July 27, 2022 6:54 pm

Thanks for the post. In my application the issue gets even worst when you have order by in the query as well. I did switch away from contains to containstable and the issue got resolved.

Reply
Peter
February 20, 2023 4:25 pm

I’ve found on SQL Server 2016 Enterprise, full text searches on secondary replicas cause a LOT of redo thread contention. Is this something you have experienced yourself Brent?

If the searches are seldom – fine. If you get hundreds or more per minute, schema is locked for the runtime of each of those queries and that has lead to secondaries falling way behind.

Reply
- Brent Ozar
  February 21, 2023 3:02 pm
  
  That’s not just full text queries – any queries on a read-only replica will lock the schema while they’re running, just like they do on a primary. The table isn’t allowed to change underneath you.
  
  My bigger question is, “What’s trying to constantly change the schema on the primary?” That question will lead you to the real villain in your environment. Hope that helps!
  
  Reply
Ezra Spurnrose
February 21, 2023 6:48 pm

Running SQL 2019 in 2014 Compat mode, we see the slowness as well. However, hinting 150 compat level the execution time gets much better: OPTION(USE HINT(‘QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150’)

Now, if we could only safely switch to 2019 compat mode. LOL

Reply
Patrick Trongo
September 29, 2023 10:04 pm

Searching was killing us! We didn’t want anything to fancy, at the time, but wanted it to be fast. The search is for company names. e.g. ‘Ralphie Ralph LLC DBA Big as Texas Burgers, Buns, and Beer’. Substring searches were too slow and required the user to know the order of the words. Our solution was to parse each company name in our database into its component words. and store them with a reference to the company. not only did this speed up things, it made it much more flexible. A user could enter ‘”Big Bee bur” and get the company in the returned list. This was 12 years ago. Things are still working. We are working to beef it up, no pun intended,
8 years ago, we added an intermediary table to presort our searches by selectivity. We search for the most selective term and use the result as the criteria for the next level of searching. Each pass increments a counter on the company list generated by the first search. It is all based on the beginning of the words..so each search is a like ‘___%’. Time has caught up with us. I still have users using single letters to search for the above, but I also have customers with names like ‘b &b LLC’ . 2/3 of the companies have llc in their names.. so that’s useless. the rest of the query is to bring back every company that has a word that starts with ‘B’. I’m rewriting it because we did some very un-smart things. We have table vars and table-valued functions. I fixed all of that, Even the single-letter names work now. If we are saving a single letter, we pad it out to the letter repeated 5 times. same with two-letter words.. we just double them. This has eliminated most of the massive return sets. I haven’t released what I’ve done, so now while there’s still time, does anyone have a better idea? the concept is pretty similar to Pauls trigrams, but a little easier for this mear earthing to understand.

Reply
- Brent Ozar
  October 1, 2023 4:55 pm
  
  You won’t really get great responses by posting comments on blogs, unfortunately.
  
  Reply
Jon Morisi
October 1, 2023 9:03 pm

Have you looked into Apache Solr?
https://solr.apache.org/

Reply
- Brent Ozar
  October 2, 2023 7:10 am
  
  Yep! The challenge is keeping the data in sync between SQL Server and Solr, and doing filtering on both relational data in SQL Server and the text data in Solr.
  
  Reply
AOL Bob
March 3, 2024 3:16 pm

Azure AI Search may have cracked the full text synchronization issue via their concept of custom indexers. They have an indexer for the various SQL flavors.

Check out : https://learn.microsoft.com/en-us/azure/search/search-indexer-overview#supported-data-sources

Of course, you are now outsourcing your full text searching to Azure AI Search and paying more money. May or may not be good fit for you.

Reply
- Brent Ozar
  March 3, 2024 3:56 pm
  
  Have you tested it with the query in this blog post?
  
  Reply
The Questioner
April 21, 2025 11:08 am

Hey Brent!

I created a full-text index on a table that has 9 million records and the database size has unfortunately grown to over 320 GB.

Reply
- Brent Ozar
  April 21, 2025 11:40 am
  
  OK?
  
  Reply
  - The Questioner
    April 22, 2025 8:14 am
    
    From 28 GB to 320 GB !!!!
    
    Reply
    - Brent Ozar
      April 22, 2025 12:13 pm
      
      I’m not clear if you’re asking a question, but if you need personalized help with production databases, feel free to click Consulting at the top of the page. Cheers!
      
      Reply
Shaun Austin
April 21, 2025 11:42 am

From what? If it was 1GB, I’d be worried. If it was 319GB, then happy days!

Reply
Lars Fiedler
October 22, 2025 11:47 am

Composable Analytics uses Sql Server full text indexing, and we’ve had very good results with performance. Our searches end up being very complex, and need to include a hierarchical permission model as well (recursive ctes) in the query. We’ve leverage schema binded views.

Our faceted search supports ands, ors, nots, paranthesis, etc … But the keyword search is always an AND between that and the facted search. If you make that assumption (and do not do ors between your keyword search and other terms), you can get it to go fast.

Reply
- Brent Ozar
  October 22, 2025 12:18 pm
  
  Cool, glad you’ve found something that works for you!
  
  Just be aware that just because it works for you, doesn’t mean it works for others. Hope that makes sense. Cheers!
  
  Reply