SQL University: The Basics of Storage

Last Updated 7 years ago

A while back, when I was working as a DBA (before I became a consultant), my Storage Area Networking (SAN) admin decided to leave the company. I volunteered to take on his job duties because I wanted the money wanted to further our educations. We learned a lot through some very hard lessons, and I consider myself very lucky to have had that opportunity. It’s really hard to get your foot in the door as a SAN admin, managing millions of dollars in hardware holding priceless data, and it’s just as hard to get quality SAN training.

This week, I’m going to try to bring you up to speed on how SQL Server and storage interact with each other. This isn’t meant to be a doctorate-level course in SANs, but rather, a simple series that will teach you what you need to know in order to do your job better. If you like what you see here, you might want to pick up Professional SQL Server 2008 Internals and Troubleshooting. The book is targeted at DBAs who need to know more about what’s happening in and around SQL Server. I wrote the storage chapter.

Today’s session will be about the very basics of storage. I’m going to simplify things in order to help you see the big picture. The experts in the crowd will notice that I’m leaving out things like snapshot isolation, multiple data files, and parallelism, but hang in there – we’ll get to some of those issues before the end of the week. Alright, let’s get started!

What SQL Server Stores

At their simplest, SQL Server databases consist of two things: a data file and a log file. The data file holds our tables, stored procedures, views, and so forth. The log file holds a running play-by-play analysis of what SQL Server is trying to do at any given time.

When one user issues a SELECT statement against the database to read data, the SQL Server engine’s response is fairly straightforward. It reads the data it needs from the data file.

When users INSERT, UPDATE, or DELETE data, however, things get a little more complicated. SQL Server needs to make sure that two people can both access the data simultaneously and safely. It also needs to guarantee that if the power plug gets pulled in the middle of an UPDATE statement, our data will be safe when the server restarts. The engine has to be able to roll back (or roll forward) transactions that had not yet completed.

Want to learn more about this? Drop ACID. The concept of Atomicity, Consistency, Isolation, and Durability is the foundation for heavy-duty relational databases.

This is where the log file comes in. In simple terms, when a user inserts a record, the SQL Server engine:

Writes an entry to the log file recapping what it’s about to do
Writes the new record into the data file
Writes an entry into the log file saying it’s done mucking around in the data file

As often happens, let’s assume that a drunken sysadmin held down the power switch on our database server at the exact moment our SQL Server was busy performing step #2. After he hit it again to power the server back on, the SQL Server engine would start up and review the log. It would notice the logged entry from step #1, notice that there wasn’t a matching log entry for #3, and then go poke around in the data file to see whether or not the change had been made. I don’t want to advocate drinking in the datacenter (if you see your sysadmin drinking, make sure to take his rack keys away), but I do have to say that SQL Server is amazingly resilient. I’ve seen servers lose power or lose connections to their storage in the middle of very heavy activity, and yet they’ve powered back up fine. (If your server gets into a situation like this, though, there are precautions you’ll want to take, and we’ll address those later in the week.)

The drawback of this safe system is a that it’s constantly writing to the log file. Looking at our three-step example, two of the three writes involve the log file. The more often we do DUIs (Deletes/Updates/Inserts), the faster our log file drives need to react. When we’re configuring hardware for SQL Server, we want to consider two things:

Put the data and log files on separate drives
Put the log files on the fastest-writing drives we can afford

In a perfect world, we’d put everything on the fastest drives we can afford, but in this perfect world I would be eating bacon instead of teaching you bozos, and bacon would prevent heart attacks. Back here in the real world, though, we have to take Lipitor and cut back on our budgets. We can often get away with a little less speed on the data drives if we can use the SQL Server’s memory to cache queried data and avoid hitting the drives for every read, but we can’t use that memory to cache log file writes. Furthermore, the log file isn’t the only thing that wants its own storage.

Son, Your Server is Writing Data Your Memory Can’t Cache

SQL Server does a pretty good job of using memory to avoid hitting storage. Unfortunately, in addition to log files, two other parts of SQL Server also like to do a whole lot of writing at the exact same time we’re accessing the data file: TempDB and indexes. When these things need to write data, they need to write it NOW, and we can’t throw memory at the problem to make it faster.

TempDB is a system database that ends up a little like your kitchen junk drawer; it’s got all kinds of random stuff in there that isn’t really related to each other. Users write queries that create temporary tables in here, SQL Server uses it for big sorting operations, and the engine even uses it for some special transaction modes. In the case of temp tables, for example, users will write code like:

CREATE TABLE #MyReport (PilotID INT, DogfightsWon INT, DogfightsLost INT, TowersBuzzed INT)
INSERT INTO #MyReport (PilotID, DogfightsWon, DogfightsLost, TowersBuzzed)
  SELECT p.PilotID, SUM(p.DogfightsWon), SUM(p.DogfightsLost), SUM(p.TowersBuzzed)
  FROM dbo.Pilots p
  INNER JOIN dbo.Instructors i ON ....

CREATE TABLE #MyReport (PilotID INT, DogfightsWon INT, DogfightsLost INT, TowersBuzzed INT)

INSERT INTO #MyReport (PilotID, DogfightsWon, DogfightsLost, TowersBuzzed)

SELECT p.PilotID, SUM(p.DogfightsWon), SUM(p.DogfightsLost), SUM(p.TowersBuzzed)

FROM dbo.Pilots p

INNER JOIN dbo.Instructors i ON ....

The important thing to note here is that SQL Server will be doing two things simultaneously:

Reading from the data file
Writing to the TempDB data file

This is a fairly typical use for TempDB, so when managing a SQL Server with TempDB-intensive queries, it makes sense to isolate TempDB onto its own separate hard drives.

Indexes present a similar problem. If we’ve got a table with five indexes, then every time we add a record to our table, SQL Server also has to add five other records – one to each of our indexes. That’s a lot of writing, but rather than adapting our storage design, I recommend that DBAs control their index designs. Use this SQLServerPedia query to find unused indexes, because dropping them will instantly result in faster DUIs – and who doesn’t want that? Dropping indexes is cheaper and easier than micromanaging multiple arrays to handle data and indexes, because indexes aren’t as easy to isolate as the log files or TempDB files.

So we’ve identified several things that might need separate storage due to their constant simultaneous access:

Data files
Log files
TempDB
Indexes (plus a hidden one, full text indexes)

In our next class this week, we’ll look at our storage options – the places we can put these files, and they may perform differently.

Continue to Part 2: Where We Store Stuff

Finally got my Microsoft MCITP Certification

SQL University: Where We Store Stuff

18 Comments. Leave new

Jeff Smith
December 21, 2009 8:34 am

Speaking of drinking in the datacenter, was onsite at a local hardware storage vendor, where Sweet Tea is available 24×7. During the tour, DBA has his 96oz glass of tea sitting on a rack like a cup holder several times. I think they were pretty confident in their failover process?!?

In their defense, it was an R&D lab, not a production one. I think they really enjoyed the look on my face…

Reply
Jacob Sebastian
December 22, 2009 1:11 am

Hi Brent,
Thank you very much for such a simple and easy-to-understand explanation of the storage basics.

Reply
- Brent Ozar
  December 22, 2009 8:29 am
  
  You’re welcome, glad you liked it!
  
  Reply
  - Turaa
    March 23, 2016 5:29 pm
    
    Help me become more knowledgeable about SQL server
    
    Reply
    - Brent Ozar
      March 23, 2016 5:58 pm
      
      Sure, start reading our blog. We pour our lives into this thing. Enjoy! 😉
      
      Reply
Yuri Baijnath
January 18, 2010 9:32 am

Hi Brent. Excellent article.

Reply
- Brent Ozar
  January 18, 2010 9:42 am
  
  Thanks, glad you liked it!
  
  Reply
Stephen Mitchell
September 19, 2011 10:42 am

Great article…thanks to my CS250 class, I actually understood most of what was discussed. Fascinating topic. Hope to learn more.

Reply
Cees Cappelle
January 16, 2012 6:45 am

Great article. I’m in the middle of trying to figure out why and where the performance bottlenechs are in our set-up. Part of it is trying to speak SAN 😉
Current setup is a Active/Active Failover SQL2005 cluster on W2K3 with data storage on SAN Shared drives.

Reply
sheik
February 10, 2012 4:51 am

hi
brent want to learn everything right from basics about sql server 2008 in ur words with best examples.

Reply
- Brent Ozar
  February 10, 2012 7:08 am
  
  Sheik – well, I haven’t written much on the basics, but your best bet is my recommended books list:
  
  https://www.brentozar.com/archive/2008/08/recommended-books-for-sql-server-dbas/
  
  I’ve organized my favorite books there by seniority level. Enjoy!
  
  Reply
Schiller Salomon
April 17, 2013 10:36 pm

Thank you Brent for this crystal clear explanation on storage.

Reply
- Brent Ozar
  April 19, 2013 6:42 am
  
  You’re welcome!
  
  Reply
Ahmed Ali
July 4, 2013 6:29 am

Well explained. Thank you Brent.

Reply
wahid khan
February 5, 2014 2:07 am

Hey Brent,

Thank you very much for knowledge sharing.

Regards
Wahid

Reply
Tiago
November 16, 2015 10:33 am

Hi Brent,

When you are describing the 3 steps that occur when you change data , doesn’t the 2 step occur in memory creating a dirty page ? I had the idea that the data file is not touched straight away and that would happen in the future by a checkpoint.

Thanks,
T

Reply
- Brent Ozar
  November 16, 2015 4:16 pm
  
  2 – yep, exactly!
  
  Reply
comfort
October 26, 2022 11:34 pm

thank you

Reply