Is it Cheating to Solve A Performance Problem By Adding Memory?

SQL Server
23 Comments

We had a comment recently on Five Things That Fix Bad SQL Server Performance that got me thinking. The comment came from a frustrated system administrator, who wrote:

Chucking resources at a problem is not solving that problem, it’s just temporarily masking the symptoms of that problem.

memoriesFunnily enough, I’ve heard the exact same thing from developers who are facing a tough problem on a dramatically undersized SQL Server. The basic gist of what they ask is:

Doesn’t it make me a bad developer if I can’t solve this problem with code?

Everybody’s worried that adding memory is somehow cheating.

Performance Problems and Memory

There are three general categories of performance problems relating to memory:

  1. Problems you can solve with memory OR code / indexes
  2. Problems you can solve only with code / indexes and NOT just memory
  3. Problems you can solve only memory and NOT just code / indexes

For category #1, solving the problem with memory is often cheaper than changing code and indexes: the operational cost of the hours of development and testing can be quite high.

But I specifically started thinking about category #3. This is the category that the frustrated sysadmin and proud developer think doesn’t exist!

Here are three examples of problems that you can solve with memory, and not simply with code and indexes:

1. Memory Reduces IO On a Read-Hungry Workload

You’ve got a critical OLTP application. You’re using shared storage. Most of the time, your storage throughput is pretty good. But sometimes, other users of the storage get busy, and your read and write latencies go through the roof. Your queries slow down, blocking increases, and timeouts start hitting the application. You’ve tuned your indexes, but parts of the application rely on realtime reports that need to scan large tables.

SQL Server is designed so that in this case you can add more memory and reduce read IO to the storage, giving you more stable performance. It’s absolutely not cheating to give the server more memory — speeding up the storage would be much more expensive.

2. Some Queries Need Lots of Workspace Memory

In another database, you do lots of reporting and aggregation. You’ve got large tables. Your queries frequently join them, do sorts, and use parallelism. All of these operations need memory, and the more data you’re aggregating and querying, the more memory each of these queries can need.

Before a query starts running, it figures out how much of this workspace memory it needs, and looks at what is available given other queries that are running. SQL Server has to balance the memory used for data pages, execution plans, and this workspace memory: it doesn’t want to let one query take over! If not enough memory is available, your query has to wait.

You’ve optimized your queries and indexes and made sure the memory estimates are realistic, but when lots of people run reports, your queries can’t even get started because of a memory crunch. Solving this problem by adding more memory isn’t cheating: it’s helping the SQL Server do what it’s designed to do.

3. Some Features Need Lots of Memory Periodically

You’re managing a data warehouse where you’re using table partitioning to implement a sliding window. This has helped your users a lot: you can add and remove large amounts of data with minimal locking and blocking.

But for query performance reasons, you’ve had to add one or two non-aligned indexes that are present during the day. To do partition switching, you have to disable these indexes to do your sliding window data load and archive each night. Creating the partitioned index can consume large amounts of memory.

Using Enterprise features can require investing in Enterprise levels of memory.

Adding Memory Isn’t Cheating

Memory is an important tool for performance in SQL Server: it isn’t just something that covers up bad code, it’s often a solution to many different types of problems. Memory isn’t the only tool you need to help your queries go fast in SQL Server, but it’s certainly NOT a tool you should ignore.

Previous Post
Is Azure Really 60% Faster?
Next Post
The Second Concern for SQL Server Virtualization: Recoverability

23 Comments. Leave new

  • I wholeheartedly agree with you here Kendra. I cannot control the way our developers develop – most DBA’s simply do not have the time and/or bandwidth to look over all the code they deploy. Plus, most databases grow over time and eventually additional resources are necessary to handle the increased demand/workload. My approach as a DBA is to tune the workload by harvesting sp_BlitzCache results and periodically reviewing those results for low-hanging fruit – that has really been a life saver (thanks to the Ozar team!). On occasion I do ask the developers to review certain procedures/statements – it’s a balancing act to say the least.

    Reply
  • A long time ago I was a pipeline construction worker. Whenever someone complained about the state of the equipment, they were reminded that “bad operators complain about their equipment. Good operators make bad equipment work.” I always remind myself of this when I want to complain that the server too [insert problem here]. And that applied to memory.

    Now with the price of memory what it is, it seems in many cases throwing memory at a problem might be more cost effective. Sometimes it may be cheaper to add memory, compared to having a developer and a DBA spending days tweaking code, and finding that magic elixir index.

    Reply
    • These days, I think we can safely think of memory as duct tape. Should you go find the right materials to permanently fix whatever problem you’ve found? Absolutely. Can you duct tape something back together while you wait for materials? You bet!

      Reply
  • Loved this post! I just had to add 20 GB of memory due to scenario #1. The performance improvement was immediate and such a quick fix to a problem that wasn’t going to be resolved any time soon. We are using a 1 GB iSCSI connection to our shared storage for our clustered servers. Our 78 GB OLTP database was not liking that!

    Reply
    • Perfect, clear and succinct answers on all 3 nicely identified illustrative scenarios. My own life savers are sp_BlitzCache AND sp_BlitzIndex when I want to dive a bit deeper into the data model and try to look and see if I can send some useful pointers to developers sitting somewhere far, far away (these days who knows where they are)…nonetheless, dev teams do look at these and some have made improvements/changes to their code and data model by going back to their dev cycles. Especially true when an application has tons of in-house developed and supported code. Vendor code running inside the db is a different issue, and a lot more political due to SLAs, etc. I get much better reception and mileage when I deal with apps that are built and supported by teams who I can get hold of, at least at the team-lead level.
      Last but not the least, I don’t wish to overlook the bread-and-butter sp_AskBrent and sp_Blitz procs that deserve to be recognized by every DBA who troubleshoots issues/crises. I see a new stored proc for Reporting Services is just out (and likely will evolve further over time like everything else), and now I cant wait for another one that could perhaps look at Analysis Services related stuff that gets worked on inside the database, such as cube build jobs processing partitions etc…These stored procs, when learnt how to use and interpret correctly, makes any DBA come out looking sharp :-), thanks to the Brent Ozar team for these killer utilities, please keep up the good work.

      Reply
  • Somebody mentioned that a lot of memory make SQL Server come up with different execution plan, which is more memory intensive and might be less efficient from performance perspective.

    Do you know any scenario when more memory is bad?

    Reply
    • Hi Slava,

      SQL Server might be able to give the query a more favorable (read: larger) memory grant due to having more available memory. In almost all cases, that’s a net positive – SQL Server is VERY good at managing memory. I’ve not seen a case where more memory caused a performance problem.

      There are a couple of schools of thinking where it comes to computing efficiency. The older definition of computing efficiency is 100% usage of all resources. An idle processor is doing nothing useful, and unused memory is a loss when you consider opportunity costs. However, you want those resources to be used for real work, and you’ll tend to see a limitation on one of those resources – thus, the ideal that we should make our code as efficient as possible before throwing more resources (memory) at the problem.

      Reply
      • JK – As I noted in my previous post, 9 times out of 10 I have had very little control over what the programmers author. Typically I’m outnumbered 10-20 to 1 (me), and we as DBA’s have enough on our plate whereas we simply don’t have time to review all the code that is developed and deployed. That’s why a lot of us use tools like sp_BlitzCache to monitor and tune the databases we administer. If I see memory pressure and/or a low PLE, the most effective way to address that is to get more memory. Most of the time when the system performance degrades, it’s no easy task to hunt down the offender(s) and tune them in a timely fashion because we have customers that are getting increasingly frustrated. That is the ultimate opportunity cost to the business – keeping our customers happy. If you compare the value of one lost customer to purchasing memory, they pale in comparison. Also, I’m not sure who was ever comfortable with 100% utilization of resources – if I see my CPU go from 15% to 30% in a short period of time, that makes me uncomfortable. I never want to see my CPU utilization up over 50-60% personally – that doesn’t give you much wiggle room for the very things we are talking about such as new development, new products, new databases, etc. We often are the last folks to find out about all that… unfortunately.

        Reply
        • I only speak for myself on this one (I don’t know if the Ozar team entertains/supports this or not), but assuming you have the right version (Dev, Ent), compression can really help out when memory issues arise as well – this is assuming you have the CPU capacity. Just the other day I compressed a restored production database and it went from 133GB to 12GB. There is a lot of textual data in there, yes. But what that means is that if I have, say, 24GB RAM on the server, I’ve gone from only being able to cache roughly 25% of the data from being able to cache it all, and then some. YMMV on this option, but it has worked wonders for me because I typically don’t have servers loaded with RAM.

          Reply
    • Hi Slava,

      Great question. Here’s what came to my mind.

      There were some early versions of SQL Server 2005 where you might get large amounts of memory used by the execution plan cache, and it might not be useful.

      There have also been some hotfixes for Windows that you need to apply when you go over a certain threshold for memory and are running SQL Server to avoid performance issues. (We list these some of these in our setup guide at BrentOzar.com/go/setup)

      So unfortunately, no change is safe from risk, not even adding memory!

      Kendra

      Reply
      • Kendra,
        I’ve just tried to do that research by my own.

        Here is an article:
        http://www.sqlconsulting.com/news1208.htm

        Unfortunately author does not describe exact scenarios, but fortunately I do not have 1TB of memory and that is not my problem yet.

        Reply
        • Oh interesting! The link they mentioned to a “whitepaper” is broken, but I believe it was meant to go to this KB: http://support2.microsoft.com/kb/2413549

          Unfortunately, that KB doesn’t contain a lot of details. It does mention that there’s some issue with costing in some versions of the optimizer that can happen with “large” amounts of memory, but gives very few specifics about what amounts, what characteristics of the query, the plan, etc.

          The trace flag that is mentioned, 2335, can be used with QUERYTRACEON for testing purposes, or to apply to individual queries: http://support2.microsoft.com/kb/2801413

          I have worked with SQL Servers with 1.5+ TB of memory, and I can say that suddenly getting bad query plans after you add the memory is rare in my experience.

          Thanks for your comment and for that link!

          Reply
  • “SQL Server is designed so that in this case you can add more memory and reduce read IO to the storage, giving you more stable performance. It’s absolutely not cheating to give the server more memory — speeding up the storage would be much more expensive.” so does that mean simply add more memory even if it has a good amount already will help reduce io latency without making any SAN changes?

    Reply
    • If you have a “good amount already” is the key – what is a good amount? What is your PLE (page life expectancy)? That is a descent initial metric to determine if you have a good amount. You may have a 300GB database, but only 30GB that is really active and in the buffer cache for any period of time. But when comparing latency – RAM/memory is much faster than even a SAN loaded with SSD’s (if I recall 8x or 10x faster – I cannot find the reference). Unfortunately profiler doesn’t distinguish between logical (RAM) or physical (disk) reads – extended events are where you can get that data, or you can get that from setting “SET STATISTICS IO ON” – you can also get physical/logical reads there.

      Reply
      • in my case for example one of systems processed around 5 million records, which is being replcated. Now on subscriber I see average disk write > 30 ms, but my page life expectancy is 1895 so I was wondering would adding more memory do anything ?

        Reply
        • I would look at your wait statistics, and also other DMVs to identify what type of disk writes they are.

          For example, if your primary bottleneck is writelog or transaction log writes, memory won’t help with that.

          Reply
  • I find the question (is it cheating? etc) interesting. We, like the Optimizer itself, have to do cost-based optimization to solve problems, and sometimes while we might think the “perfect solution” is to rewrite the code, the cost-based model has us basically look for a good enough solution in a reasonable amount of time…and often, that’s resources. We put in a Violin SAN right before a major go-live project principally because we knew we were about to get a lot of sub-optimal code dumped into production, and we had no way to fix it all in a reasonable amount of time. So throwing resources at it was our “good enough” plan, and the right course to take.

    By the same token, there are a lot of problems where you can kind of fix the symptoms with resources, but as your databases grow and volumes scale up…the problems with bad code reemerge. So the classic DBA “it depends” applies. Throw as much RAM as you can at a problem but also be proactively thinking ahead on how to fix a bad bit of code before it spirals out of control due to data growth.

    Reply
    • I totally agree that there has to be balance. I can write TSQL that’s so bad that it’s painfully slow even when all the data fits into memory and there’s plenty of processor resources 🙂

      Reply
  • There are two things I like to mention.

    First, your time is valuable and memory is cheap. Although I hate asking for money, it’s often faster and cheaper to up your memory than to fiddle around with PLE. If you get paid say $52,000 / year then it’s hard to justify tuning for 2 weeks straight costing $2,000 instead of a $1,000 memory upgrade that may give similar results without pushing off other priorities. This is assuming your PLE is lower than you’d like, you have high waits on PageIOLatch_SH, more room for memory in your box, a workload that would use the extra cached data, etc. (do your research, know your server).

    Yes, you should look for index scans, excessive key lookups, data purging opportunities, duplicate indexes that can be combined, and everything else that takes your time, doesn’t happen overnight, and requires more testing and risks than adding memory if it fits into your goals and priorities. Until then, the people who are paying the bills are waiting for their screen to load.

    Second, no matter what you do there’s a good chance your data is going to continue growing. You try to purge data to keep it in check, but the business keeps making cases that they need to keep more. Upgrades expand functionality by adding another couple fields to your biggest table and creating another couple huge tables to manage. What you’re trying to take care of is growing and there’s no shame in hardware growing, too.

    I love tuning. It’s how I relax after a stressful day. Getting users to call up complaining that they know their screen isn’t working right because it loaded too quick is a goal in my life. If someone like me isn’t ashamed to buy memory then no one should be.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.