All databases are not created equal.
Over the last couple of years, IT management pressured database administrators to cut hardware costs and licensing costs. We had a few choices:
- Consolidation – take databases that used to live on separate SQL Server instances and merge them onto one
- Instance stacking – install multiple SQL Server instances onto less operating systems
- Virtualization – keep the SQL Server instances separate, but merge their OS’s onto less hosts by turning them into virtual servers
All of these have their pros and cons, as I discussed with Kevin Kline and Ron Talmage on a Quest webcast last year. (You can watch the recorded webcast for free.)
The bottom line, though, is that all of them end up with multiple databases sharing less physical hardware. The more databases we have on a server – whether virtual or physical – the more we have to monitor the resource utilization of each database. SQL Server doesn’t know which databases are more valuable to the business, or if one database is supposed to be more active than other databases.
When Tuning, Look for Greedy Databases
When I start tuning a server with more than a handful of databases, I like running Louis Davidson’s file stats DMV query. It lists each database file and the basic IO stats like number of reads & writes, size of reads & writes, and IO stalls. I don’t look for specific thresholds, but rather an imbalance – is one database dominating the IO?
If I see one database absolutely dwarfing the read/write statistics of other databases, I bring that to the attention of the server manager. To drive the point home, I copy the query output into Excel and whip up a quick graph. In the example shown here, one particular database was responsible for 88% of the reads on the server – more than all of the other >80 databases combined.
I’ve changed the database names to protect the innocent, but the one database was a enterprise-wide monitoring program’s repository (and no, it wasn’t a Quest product.) While that database did have a legit business value, it didn’t justify draining the server’s resources. In comparison, the Sales and SalesHistory databases (not really their names, but renamed for illustration) only used around 3% each. Those two databases had a vastly higher business value, and they were suffering due to the monitoring application.
The IT team had done the right thing by consolidating databases down to less servers, and when they needed a home for the new monitoring software repository, they used a shared instance. At first everything went fine, but over time the repository’s needs dwarfed even the mission-critical sales application.
The problem turned into more than just storage; since the monitoring app was doing a stunning amount of reads, it was pushing memory out of SQL Server’s buffers. The page life expectancy metric – the length of time data stays in the buffer cache – was averaging under 10 seconds. For reference, Microsoft recommends this metric be 180 seconds (3 minutes) or higher. Clearly, we were running into a problem with this database.
So how do we mitigate it? In my Perfmon & Profiler 101 presentation, I talk about defining “mitigations”, not fixes, because sometimes we can only afford to duct tape the problem.
Option 1: Fix the Database
Ideally, I’d pop the hood on the database, rework the indexing strategy, and help improve the performance of the queries. However, this was a third-party application with a sealed hood – no query changes here.
We could build a list of recommended changes for the vendor, but you know how those vendors are. They never listen to anybody, do they? <sigh> I know, I know.
Option 2: Move the Database to a Standalone Server
Since the database was a non-mission-critical app, we could build a new SQL Server – preferably a Standard Edition box (as opposed to the pricey Enterprise Edition box it was sharing) with cheap storage. No matter how much money that we’d throw at the hardware, the application still wouldn’t perform well, so it wouldn’t make sense to buy an expensive box.
We could also move it to a new virtual server and use our hypervisor’s resource throttling to minimize the impact on other servers. Some hypervisors can set a “max” amount for CPU, memory and storage throughput for a given virtual server.
Option 3: Move the Database to a New Instance
We could install a separate instance of SQL Server onto this existing server and set its max memory to just a couple of gigabytes. That way, it would only penalize itself for bad performance. We should also use a separate drive array for this database so that its IO performance wouldn’t take down other databases.
Unfortunately, since this server was on the SAN, this database’s IO would still be traveling over the same Host Bus Adapters (HBAs) that the other databases used to access their data. Depending on the quantity of IO, this could still present a bottleneck.
Option 4: Throw Money at the Shared Server
We could upgrade the bejeezus out of the memory and storage used by the SQL Server. This would speed up the monitoring application along with the other applications on the server. However, since the monitoring app ran 24/7, the other apps would never really see the full benefit of the additional hardware.
Which Option is Right for You?
Making the right choice requires knowing the business value of the resource-intensive database. If you don’t know what the database is used for, email your management with a copy of your pie chart and let them decide.
In this particular case, the client chose to move the database to its own server. The results speak for themselves – the server did 93% less reads per second on the data drive, meaning less IO saturation:
And page life expectancy, the length of time data stays in memory after it’s read off the disk, improved almost 100x:
The server was suddenly more responsive, more usable, and the mission-critical applications started flying again. Mr. Nastydatabase was confined to his own sandbox, hurting no one else but himself until he could learn to play well with others.