Is your SAN’s cache killing tempdb?

Last Updated 7 years ago

Let’s start with definitions

Many SANs have caching built in. What kind of cache is important, because if you’re dealing with non-SSD storage underneath, you could be waiting for a really long time for it to respond.

Let’s start with some definitions of the most popular caching mechanisms available for SANs. I’m not going to say ‘only’, because some vendor out there might have some proprietary stuff going on that I haven’t heard of.

Write-through: Much like synchronous Mirroring or AGs, writes have to be confirmed twice. They’ll write to the cache, but they’ll also write to the underlying disks, and then throw a secret handshake saying that it’s committed and all is well. This SUCKS if your underlying pool of disks are slow, saturated, or otherwise abused.

Write-around: Basically does this to the cache, skipping it entirely, and writing to disk. This can be fast, but then any data you write directly to disk will have to be read into cache when something needs it. If your application relies heavily on recent data, this can be a really lousy choice.

Write-back: Like good ol’ asynchronous commits, this writes to the cache, says everything is cool, and eventually writes it to hard storage. That means your most recent data is in cache and available, but maybe not on the most stable ground just yet. If you have slow disks underneath, and the power goes out before it writes to them, you could potentially lose some data here, unless your cache has some resiliency built in. So be careful what you wish for, here.

Why tempdb?

Because you people beat so much tar and sand out of it that you’re either going to strike oil or find a new dinosaur. If writes here are slow; if SQL is waiting more than 1 second for data to just write out to here, all of your subsequent reads are at the mercy of those writes.

What’s the sense in tuning a query that will always have overhead writing to tempdb?
Users complain that inserts are slow because you have a trigger (you know those use tempdb, right?) that stalls out for three seconds at run
Your maintenance (DBCC CHECKDB, indexes sorted in tempdb) can’t finish because your tempdb write stalls are the envy of only a Gutenberg Press.

The moral of the story

If you’re using local storage, there’s no excuse for not going SSD.

If you went out and got yourself an expensive SAN, and now you can’t afford to put good drives in it, you’re SAN poor and you made a bad choice.

If you run tools like CrystalDiskMark or DiskSpd, you’re using SQL’s DMVs to check on disk performance, or your monitoring tool is showing bad write latency, check to see what kind of caching you’re using. Start asking questions about the underlying drives, the SAN connections, and ask for numbers from your SAN admin. Downloading more RAM won’t fix slow writes!

Thanks for reading!

Brent says: Intel’s speedy 400GB PCI Express SSDs are down in the $700 range. Just do it.

3 Comments. Leave new

John Nelson #2
June 7, 2016 1:07 pm

$340
http://amzn.com/B0153P9SEK

Reply
Roman Brauchle
June 9, 2016 3:10 am

When I tested the performance of disks I always used Write-through as I think I was told SQL Server always uses Write-through. Can SQL Server tell the SAN which method to use or is it always doing what the SAN will provide?

Should I think about different caching methods based on DB, LOG, TempDB?

Reply
- Brent Ozar
  June 9, 2016 6:24 am
  
  Roman – no, SQL Server can’t tell the SAN what to do. For different caching methods, check out your SAN vendor’s best practices – they give detailed guidance: https://www.brentozar.com/go/san
  
  Reply