Database professionals say, “That’s slow because it’s a size-of-data operation.”
A size-of-data operation is when the database has to write all of the affected data again. What you’re asking for might seem small, but for the database, it causes a lot of work.
For example, let’s say you live in a strict household, and your parents have told us that under this roof, everyone’s hair can only be black, blond, brown, or red. (Don’t blame me. I don’t make the rules. They’re your parents, not mine. Go talk to them. And tell your mom I said hi.)
If your parents suddenly loosen up and say, “We’re now allowing you to dye your hair pink,” that doesn’t mean we have to touch everyone’s head. It’s a metadata operation: we’re just making a new note so that when someone walks into the door and we check their hair color, we have a new acceptable option.
However, if your parents freak out and say, “Everyone’s hair must be pink,” that means we’re gonna need to dye every person’s hair. We’re gonna have to gather everybody together and do some real work.
That’s a size of data operation: however many people live in the house, we’re gonna need to work on all of ’em.
How that translates into databases
I’ll use the Users table in a large Stack Overflow database to explain.
If I alter a table and add a new nullable column, it happens instantly, and SQL Server doesn’t have to touch any pages:
Adding a column like that is just a metadata change. It tells SQL Server, “When new rows walk into the house, they’re allowed to specify their HairColor now. When rows leave the house, if they don’t already have a HairColor set, just say it’s NULL.”
However, if I populate that column and dye everyone’s hair pink, that’s gonna take some time, because we’re gonna have to touch every row:
That’s a size-of-data operation, so it’s slower.
How that affects development & deployment
When you’re just getting started developing an app, and all your tables are empty, you don’t really have to worry about this kind of thing – because size-of-data operations are fast when there’s, uh, no data involved.
But the larger your data becomes, and the slower your storage becomes (like if you move from nice on-premises flash storage up to the cloud), then the more you have to pay attention to this. You’ll want to run your deployment scripts on a real production-sized database, not just an empty development shell, and time the deployment scripts. The longer an operation takes, the more likely it is a size-of-data operation. On larger databases – at 100GB and higher – and at 24/7 uptime shops like online stores, you’ll likely have to tune your deployment scripts to avoid size-of-data operations.
I haven’t seen a good rundown of all the operations that are size-of-data operations for Microsoft SQL Server, but it’s changed over the years. Hey, if you’ve got a blog – there’s a potential idea for you! Not me though. The hot tub is calling.