05. Computed Columns (10m) - Brent Ozar Unlimited®

So far, we’ve done a great job of being both awesome and lazy, but we can take that a step further. We want to keep the rules of our Crossbows and Cursors games simple, and one of the rules that we’ve decided on is that a player’s hit points are determined by their level. Specifically, their hit points will be ten times their current level. A level two character would have 20 hit points, a level four would have 40, and so on.

[wpc_button]

The objective here is to create a player hit points column that is basically ten times the player level column. How do we want to accomplish that? We have a few options. First, we could create a view where we simply take the player level, multiply it by ten, label that column as being the player hit points, and then everyone can refer to that column in the view. The only drawback is, everyone has to refer to that column in that view in order to get that data.

Hmm, so what else can we do? We could make a column in that table and then update the column every time the player’s level increases. If the player goes up to level 4, the hit points go up to level 40. Hmm. That sounds like work. What if we defined a column in the table and said that that column had to always be ten times the player’s current level? That might work, because that way we don’t have to point anyone to a view, and we never have to worry about running an update statement or forgetting to run an update statement. That sounds efficient. Sounds nice and lazy. I think I can get behind that. What we really need here is a computed column.

Before we go about creating our computed column, first we need to understand a couple of underlying concepts. [00:02:00] Concept number one is determinism. Determinism simply describes whether or not, when you ask the same question of the same data at any time, is the result going to be the same? If the answer to the question never changes, regardless of when you ask it, it could be today, next week, next month, next year, then it’s sad to be a deterministic question. If the question’s answer varies depending on when you ask, so today could yield a different answer from a year from now, then that’s non-deterministic.

An easy way to remember this is that a deterministic question is determined to hang on to whatever the answer it is that it has for you. A non-deterministic question is more noncommittal, just depends on when you ask it. Let’s take a look at some deterministic and non-deterministic questions.

What’s my first initial? It’s D, isn’t it? No matter when you ask that question, it always will be D. That’s a deterministic question. Here’s another one. How many characters are in my street address? Again, as long as I haven’t moved, it doesn’t matter when you ask that question. The answer will always be the same. That’s deterministic. One more. If I have six chickens and six people, how many chickens do I have per person? Assuming the people aren’t being eaten, then it’s always going to be one chicken per person.

Here’s a non-deterministic question. How old am I? Now that depends, doesn’t it? While my birthday is static, today’s date is ever-changing, as is my age. If you ask how old I am, [00:04:00] it depends on what today is. If you ask me that question next year, you’ll get a different answer, won’t you? In that sense, it’s non-deterministic. Here’s another one. I’m thinking of a number between one and one hundred. What number am I thinking of? The problem here is that, unless I’m a complete moron, I’m always going to choose … Hey! That’s uncalled for. I’m always going to choose a different number, because I don’t want you to figure out what number I’m thinking of, do I? With any sort of randomization, that’s also going to be non-deterministic. It will depend on when you ask.

Another question. How many people are taller than me? Hmm. That depends, but it doesn’t really depend as much on time, it depends on how many other people we’re talking about. This is tricky, because it seems like I should be able to nail down the answer, just by knowing about the other rows in the table, right? The problem is I can’t really do that, because the number of rows in the table that we’re talking about could change. Today’s answer could be different tomorrow, not because tomorrow is tomorrow, but because since then, more rows have been added, or rows have been removed. In that case, that’s also non-deterministic.

It sounds more complicated than it really is. Just remember that if the answer is, “It depends,” then it’s non-deterministic. Otherwise, it’s probably deterministic.

Now we know the difference between a deterministic and a non-deterministic question, but what’s the difference? Why do we care? This all seems academic so far, doesn’t it? It does make a difference, and the reason is only a deterministic question can be persisted. Persistence is our second concept around computed columns. [00:06:00]

Whether or not a column is persisted means whether or not the formula’s result is stored in the table itself, or if the formula has to be calculated on the fly every time that column is referenced. Only a deterministic question can be persisted. The reason is pretty straightforward. If a value can change over time, and perhaps at any time, SQL Server can’t possibly keep up with that. It’ll say, “I don’t know when this value’s going to change. I can’t possibly store it.”

Why is storing a value important? If a value can be stored, that value can also be indexed. That, my friend, can make a huge difference in terms of the performance of your query. You can store a computed column that’s persisted in an index, either as an indexed field or as an include. Don’t sweat it if you can’t persist that data, or even if you don’t decide to create an index at all, because you still get a nice benefit from SQL Server seeing a computed column there. SQL Server will create statistics around that computed column and use those to come up with a better execution plan than it would have if it didn’t have a computed column there at all.

There are some limitations and downsides that you need to know about as well when it comes to computed columns. Computed columns aren’t able to do things that are table aware. They can only focus on that one row. Things like rank, row number, average, lag, lead, anything that looks beyond that current row is off-limits to computed columns. Another downside is, if you create a computed column and it spins out a divide by zero error, that will actually be an error. It won’t handle that as a zero or a null, or anything like that. You’ll probably have to build a case statement around that to prevent that from happening. [00:08:00]

Finally, if you have CHECKDB hitting an index with a computed column, it can slow down. Way down. That’s something you need to be on the lookout for. Don’t let these downsides dissuade you from using computed columns. The benefits vastly outweigh the drawbacks.

Whew, man. That was a lot to think about. Keeping all that in mind, let’s get back to the question at hand. That is, how do we want to store player hit points in our database? Having gone through all of this, we’ve decided that the answer is to use a computed column. Let’s go ahead and set that up. First we’ll do an ALTERTABLE dbo.Player, and then we’ll ADD and the name of our new column, we’ll call it PlayerHP. Then we need to define it, so we’ll say AS(PlayerClassLevel*10). Finally, because we know this is deterministic, we can actually store the data. We’ll tell it to do PERSISTED. Once we’ve made this change, that column will get computed automatically.

Now let’s take a look at what it looks like in Management Studio once it’s already been established. You can see at the bottom we have our new PlayerHP column. It says it’s computed, and there’s even a little calculator symbol that goes along with it. If we look at the Table Designer, we’ll see a few things that are interesting. One, our Computed Column Specification, or definition, will be there, and will say “Is Persisted.” Also down below it tells us if it’s deterministic or not. If we go ahead and query that information, we’ll see that our player hit points are indeed populated the way we expect them to be.