Say you run an online store, and on each item’s page, you need to show related items that people purchased. Let’s take an Amazon page for my favorite rechargeable AA batteries:
In a perfect world, we would cache this data in the web/app tier – but back here in the real world, sometimes our developers build stored procedures to fetch this kind of data, and the stored procedure ends up getting called way too often.
To solve it, let’s build a caching layer into our stored procedure.
Here’s the pseudocode of how our stored procedure usually works:
I’m using a really simple stored procedure, but this approach works best when you have a complex stored procedure that does a lot of heavy lifting – calculations, querying big tables, aggregations, etc.
Instead of directly doing all the heavy work, check this out:
I’ve introduced a new table here – Cache.dbo.GetRelatedItems. I created a new database called Cache, and I keep it in simple recovery mode. I can use a totally different backup strategy for this database – perhaps not even backing it up at all.
The Cache.dbo.GetRelatedItems table has exactly the same columns that are normally returned by usp_GetRelatedItems, plus the input field. In this case, our stored procedure has an input field of ItemID, and it returns RelatedItemID and RelatedItemName, so the table would look like the one shown at right.
Rocket surgeon readers will note that I added an ID field to give the table something unique. They will also notice that I didn’t add a DatePopulated or DateCached field – depending on my business needs, I would probably just truncate this table every night. In the case of related items in an e-commerce store, I might only truncate it every week, and I’d want to do it right before a really low load period so that the cache could gradually refresh. This outright-truncation technique is less efficient for refreshing the cache, but it minimizes the locking required by deletes. In a caching setup, I’m worried about concurrency here.
When implementing a solution like this, I usually do a lot of A/B performance testing to find the right clustered index for the table. Typically each caching table has no non-clustered indexes, and has just one clustered index designed to produce the fastest range scans for the number of parameters for the stored proc. (Before somebody posts a comment asking for that strategy, no, I’m not blogging that, because it’d take me a day, and I’m lazy.)
If you choose not to back up the Cache database, your code should not rely on the existence of objects in it. It should start by checking to see if the Cache.dbo.GetRelatedItems table exists, and if not, create it. That way if you fail over to your DR site and the Cache database is empty, your queries won’t fail.
I’ve over-simplified the stored procedure a little, though – something actually has to populate the cache table. There’s two ways I could do it: externally, like a SQL Agent job or an SSIS process, or internally – inside the stored procedure itself. Let’s code that:
We start by checking the cache table for results, and if we don’t find any, we add them.
I’m cringing as I write this because I can hear the screams of performance tuners. Yes, I’m adding additional write load on the SQL Server – keep in mind that I only use this approach when I’m faced with:
- A very work-intensive but read-only stored procedure
- Called very frequently (hundreds or thousands of times per minute)
- Whose results change less than once a day (or where we’re not concerned about real-time accuracy)
- A business that needs immediate speed and can’t wait for developers to implement a caching layer
As soon as I deploy a solution like this and the business pain goes away, I immediately start working with the developers on a better long-term solution. This solution is an emergency band-aid to get the business up and running, but it still incurs load on the SQL Server for writing the caching results, getting locks, and running the stored procedure. This is when I start talking to the developers about caching in the app tier, and here’s my favorite resources on that: