How to Cache Stored Procedure Results

Last Updated February 13, 2017

Say you run an online store, and on each item’s page, you need to show related items that people purchased. Let’s take an Amazon page for my favorite rechargeable AA batteries:

In a perfect world, we would cache this data in the web/app tier – but back here in the real world, sometimes our developers build stored procedures to fetch this kind of data, and the stored procedure ends up getting called way too often.

To solve it, let’s build a caching layer into our stored procedure.

Here’s the pseudocode of how our stored procedure usually works:

I’m using a really simple stored procedure, but this approach works best when you have a complex stored procedure that does a lot of heavy lifting – calculations, querying big tables, aggregations, etc.

Instead of directly doing all the heavy work, check this out:

I’ve introduced a new table here – Cache.dbo.GetRelatedItems. I created a new database called Cache, and I keep it in simple recovery mode. I can use a totally different backup strategy for this database – perhaps not even backing it up at all.

Cache-Table

The Cache.dbo.GetRelatedItems table has exactly the same columns that are normally returned by usp_GetRelatedItems, plus the input field. In this case, our stored procedure has an input field of ItemID, and it returns RelatedItemID and RelatedItemName, so the table would look like the one shown at right.

Rocket surgeon readers will note that I added an ID field to give the table something unique. They will also notice that I didn’t add a DatePopulated or DateCached field – depending on my business needs, I would probably just truncate this table every night. In the case of related items in an e-commerce store, I might only truncate it every week, and I’d want to do it right before a really low load period so that the cache could gradually refresh. This outright-truncation technique is less efficient for refreshing the cache, but it minimizes the locking required by deletes. In a caching setup, I’m worried about concurrency here.

When implementing a solution like this, I usually do a lot of A/B performance testing to find the right clustered index for the table. Typically each caching table has no non-clustered indexes, and has just one clustered index designed to produce the fastest range scans for the number of parameters for the stored proc. (Before somebody posts a comment asking for that strategy, no, I’m not blogging that, because it’d take me a day, and I’m lazy.)

If you choose not to back up the Cache database, your code should not rely on the existence of objects in it. It should start by checking to see if the Cache.dbo.GetRelatedItems table exists, and if not, create it. That way if you fail over to your DR site and the Cache database is empty, your queries won’t fail.

I’ve over-simplified the stored procedure a little, though – something actually has to populate the cache table. There’s two ways I could do it: externally, like a SQL Agent job or an SSIS process, or internally – inside the stored procedure itself. Let’s code that:

We start by checking the cache table for results, and if we don’t find any, we add them.

I’m cringing as I write this because I can hear the screams of performance tuners. Yes, I’m adding additional write load on the SQL Server – keep in mind that I only use this approach when I’m faced with:

A very work-intensive but read-only stored procedure
Called very frequently (hundreds or thousands of times per minute)
Whose results change less than once a day (or where we’re not concerned about real-time accuracy)
A business that needs immediate speed and can’t wait for developers to implement a caching layer

As soon as I deploy a solution like this and the business pain goes away, I immediately start working with the developers on a better long-term solution. This solution is an emergency band-aid to get the business up and running, but it still incurs load on the SQL Server for writing the caching results, getting locks, and running the stored procedure. This is when I start talking to the developers about caching in the app tier, and here’s my favorite resources on that:

Meet Doug Lane (Video)

Top 10 Signs Your SQL Server is Neglected

23 Comments. Leave new

Andre Ranieri
December 26, 2013 11:45 am

Thanks for the informative blog.

Reply
tobi
December 26, 2013 1:15 pm

Note, that two concurrent cache fills can cause the cache to be filled with duplicate results. This needs a concurrency strategy. Maybe just an app-lock.

Reply
Greg Low
December 26, 2013 9:51 pm

You never need to use the pattern IF EXISTS (do some SELECT) do the SELECT. All you need to do is to remove the IF EXISTS clause, just perform the SELECT and check the returned @@ROWCOUNT to work out if you need to reload the cache. Also, need to rework the bottom half to avoid race conditions and duplicated values.

HTH

Reply
- Brent Ozar
  December 27, 2013 7:53 am
  
  Yeah, exactly – my personal favorite is to populate the cache periodically via batch jobs rather than one-off stored procs, but I didn’t want to go into quite that much detail, heh.
  
  Reply
- Mike Good
  December 29, 2013 9:24 am
  
  Greg, I’m not sure I agree with your stmt regarding the pattern. If we just perform initial SELECT, and upon seeing @@ROWCOUNT is zero we then perform another SELECT, that’s two resultsets. I’m sure caller code can be adapted to handle this, but that may well be out of scope. Or am I wrong…is this one of those issues that used to be a problem a decade ago but no longer applies?
  
  Brent, good article. I’ve worked at shops where we had success with this pattern, but implemented at the next higher tier of the framework. App checked if results existed for proc+parm1+parm2…, and if so returned them, else ran proc & cached results, then returned them. Separate asynch routine deleted old items from cache. Advantage of this was we could use one simple framework for all manner of procs. If trying to do this within SQL, we need a different cache table for each different type of resultset; not terribly difficult, but not nearly as simple.
  
  Reply
  - Andrew
    February 26, 2014 7:31 am
    
    Mike, that would indeed cause two select result sets.
    
    You could get around it by using somethlng like:
    
    DECLARE @Test INT;
    SELECT TOP 1 @Test=1 WHERE Cache.dbo.field1 = @ItemId;
    IF @@ROWCOUNT = 0
    INSERT …;
    SELECT …
    
    (The @Test=… should swallow the resultset.)
    
    Cheers
    
    Reply
Zinx
December 29, 2013 2:19 pm

I have worked with similar logic to cache the results, but we used Tempdb to keep the cached results, and there was a separate async process to remove the cached results after an expiry time.

Reply
None
December 29, 2013 9:30 pm

Brent Ozar needs to study the English language before writing again.

Reply
- Brent Ozar
  December 29, 2013 9:38 pm
  
  Bad news – I’ve got blog posts queued up and no English books on the shelf.
  
  Reply
  - Michael J Swart
    January 2, 2014 3:04 pm
    
    Hey Brent, this comment (and your reaction) made me laugh.
    Obligatory
    
    Reply
    - Brent Ozar
      January 2, 2014 3:06 pm
      
      Hahaha, thanks sir. And for reference, I do have some English books on the shelf and queued up, heh.
      
      Reply
- scott mcnitt
  February 26, 2014 9:38 am
  
  My old man always says “never trust anyone that can spell… or type”.
  
  Maybe old fashioned.
  
  Reply
Tim M. Hidalgo
February 26, 2014 5:56 pm

And with puttin it on Ramdisk, it’s the new style!

Very cool idea Brent. I was able to patch up your template with medicinal rocketry to check for the existence of the database (if not just get the data from the table like normal), then as you suggested checking for the table (creating it if needed). I’m currently running some tests with the Cache db on RamDisk vs. Internal disk arrays.

Not only cool but extremely fun

Thanks!

Reply
- Si
  March 27, 2019 1:15 pm
  
  This is the driest humour I’ve seen in some considerable time. You’re hired
  
  Reply
Brian White
March 27, 2014 1:41 pm

I had to do something like this – except with a function. It’s much harder with a function because you’re not allowed to make your function do any database writes. I had to jump through cache checking steps on all the procs using the function

Reply
PLSQL
March 27, 2014 6:49 pm

Some readers might want to consider Oracle which has had native function/query result caching for a few releases. http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/subprograms.htm#LNPLS00817

Reply
- Brent Ozar
  March 27, 2014 7:14 pm
  
  Just “consider Oracle”, huh? That’s your answer to a feature request? Change the entire database platform? Wow, that sounds reasonable.
  
  Reply
  - PLSQL
    March 27, 2014 9:42 pm
    
    No; asking SQLServer devoplement to implement function-result-caching would be another alternative. Of course, what you’re suggesting to “roll your own” is also possible, but getting it right in all cases can be tricky. BTW, Oracle is not the only vendor, MySQL has been providing result caching for longer than the Oracle.
    
    Reply
Daniel Ladd
April 25, 2014 9:56 am

Seems like this might be a viable place to use in-memory OLTP. The cache table could be in memory, giving it the speed that people would hope for in a cache.

Reply
- Brent Ozar
  April 25, 2014 11:31 am
  
  Daniel – if the data is frequently cached, it would be in memory anyway. SQL Server automatically caches data that is frequently queried. Plus, the in-memory OLTP (Hekaton) would suffer from the same problem that regular tables would – it still only caches the raw data, not the query output, so it would have to rebuild the query output every time the query runs. I like your thinking though!
  
  Reply
adsf
December 28, 2018 5:16 am

Chutya

Reply
Francesco Mantovani
September 16, 2021 7:29 am

Amazing. This post is so good I had to read it twice.
Being the Cache DB probably small I wonder if it could be possible to put it on a RAM drive.

Reply
- Brent Ozar
  September 16, 2021 7:50 am
  
  Hahaha, thanks!
  
  Reply