When someone brings me a query to tune, I start with a few questions:
How long does it take to run now? I’m looking for a time at first, and later as I progress through tuning, I’ll measure it more scientifically with logical reads and CPU cycles.
How long would you like it to take? Meaning, when do I know I’ve crossed the finish line and can stop tuning? Sure, I’d love to make every query take 5ms, but sometimes users are happy if it just doesn’t time out.
Am I allowed to change the query? Is it a third party ISV app that I’m not allowed to touch, or can we fiddle with it? Can I split it into multiple queries? Can I move it into a stored procedure? No, stored procs aren’t inherently faster, but if you care enough about it that speed matters, then moving it into a stored proc gives me the ability to tune it faster. I can run through more iterations more quickly without having to recompile an app.
How often does it run? Are we talking once a day as a scheduled job, or thousands of times per second as part of a page-rendering process?
How often does the data change? I try to deduce this by looking at the tables and fields involved, but I like getting confirmation from the developers. Does it change every time it’s queried, or about once a minute, or once an hour, or maybe…never?
When we get to this point, sometimes I get to ask my favorite question: why are we querying this at all? Why not cache it in the application and only fetch the updates periodically, like on startup? Often we store application parameters in the database because, well, it sure beats an INI file, and it lets people change application config on the fly without restarting things. Another common culprit is geographic data – what’s the exact latitude and longitude of a city, and what other cities are nearby? What stores are near each other?
This starts a conversation about caching: temporarily saving things in memory in the app or web tier. The most basic version of this is in-memory variables in the app – this is the fastest way to get started, but the least scalable. You can’t persist large chunks of data here, and you end up with duplicate caches across all the app/web servers. Instead, build a caching tier with software like memcached, redis, Amazon ElastiCache, Windows AppFabric Cache, and Windows Azure Caching. These allow you to scale out your cache quickly and cheaply using commodity hardware and no fancypants licensing.
Caching does not sit in front of the database server – it sits alongside:
The application code has to be written so that when it needs data:
- The app checks to see if a valid copy of the data is in the cache tier
- If not, the app queries the database
- The app adds the data to cache to avoid hitting the db again – and maybe includes an expiration date.
(Note: some caches actually do purport to sit between your app and the database. Personally, I’m not a big fan of that because now your app is tied into a specific cache provider, and that’s risky today.)
In step 1, I glossed over a whole difficult concept by saying a valid copy of the database. Cache invalidation is notoriously hard - how do we tell the cache tier when something has been changed in the database? That’s not something we’re going to cover here, but don’t let that stop you from implementing caching. There are sure-fire, no-brainer cacheable wins right now in your environment. Cross that bridge now, and then cross the cache invalidation bridge when you get to it.
Once you’ve implemented a caching tier, performance tuning becomes a whole new ball game. Instead of trying to make a query 10% or 20% faster, you’ll avoid querying the database at all.
Five things to remember about building your cache:
- Try to write your app’s caching code in a way that will let you easily change cache providers. It’s just a key-value store, and key-value stores change fast.
- Size your database server so that it’ll survive your loads if the cache falls over. This is made easier if you also…
- Build emergency switches into your app to disable database-intensive features if necessary. If the cache dies, this will give you room to breathe.
- Monitor your cache tier. Pick metrics that will help you determine when it’s time to add horsepower.
- Look for data that maybe doesn’t belong in the database – ever. As you grow more comfortable with your cache server’s reliability, start picking elements of your web app like “Today’s Popular Items” or “5 Most Recent Hits” that should only be persisted in cache.