You’ve got an existing application with a database back end. You’re thinking about changing the database, and you don’t wanna break stuff.
The most important thing to understand is difference between constructive and destructive changes, also known as additive and destructive changes, or non-breaking and breaking changes.
Constructive change examples: adding a new table or view, adding a new nullable column to an existing table, adding a new optional parameter to an existing stored procedure or function. These changes should never be made in a way that breaks anything that queries the database.
Destructive change examples: dropping a table or view, removing or modifying an existing column, changing an existing parameter. These often break apps that query the database.
Constructive changes give your code new options.
Destructive changes give your code new requirements.
Let’s start with the Stack Overflow Users table.
The public data dump version of the table has a column for Age, and let’s pretend that’s the current state of our application. Our application requirements have changed, and now we want to:
- Store Birthdate instead of Age so users don’t have to log in and change it all the time
- Calculate Age on the fly in the front end
- Default everyone to a Jan 1 birthdate that would match to their current Age
- Force them to set their correct birthdate on the next login
A destructive set of database changes would be to:
- Add a new non-nullable Birthdate column, with the YYYY/01/01 birthdate that works for their current Age.
- Drop the Age column.
- Require the Birthdate column to be entered whenever users edit their account, because our architect has mandated that the column is mandatory.
- Change the front end user-profile-edit screen to have a Birthdate field instead of Age.
All of that would have to be coordinated to happen at the same time, which means we would probably need an outage. That kind of choreographed dance is painful: sometimes it’s just not easy to change the database AND the app at exactly the same time.
Instead, a constructive set of changes might be:
1. Add a new nullable Birthdate column, and add a trigger to populate it. When someone inserts or updates a User row:
- If a newly changed row’s Birthdate is set to null, set it to a YYYY/01/01 birthdate that works for their chosen Age.
- If the Birthdate is not set to null – meaning, the app has been modified to pass in a valid Birthdate – then set the Age column’s value based on the Birthdate. Discard whatever Age was passed in – we’re going to be calculating it live down the road anyway.
The advantage of a constructive change like this is that from this moment forward, the application code can be changed at any time to either update Birthdate or Age. Just make sure you write your trigger to handle multiple rows.
2. Change the application to populate the Birthdate column instead of Age. Note that we could do either step 2 or 3 first, or even simultaneously. There are no dependencies between these two.
3. Backfill existing user Birthdate values. In step 1, we allowed the column to be nullable because that way, adding the column was a super-quick operation that didn’t require writing a ton of data to the clustered index. SQL Server can add nullable columns with only a brief schema lock. Now, let’s go back and run an UPDATE statement to populate null Birthdates with the YYYY/01/01 value. If the table was too large and workload too continuous to allow a blocking operation, we can nibble through the rows 1,000 at a time to avoid lock escalation.
4. Change the Age column to be computed based on their Birthdate. After #3 completes, all of the Users have a Birthdate. It might not be their actual birthdate – over time, hopefully folks will log in and change their birthdate to be accurate. (But they might not.) From this point forward, we need to start automatically calculating their Age. There are a few ways we could do this, but my favorite would probably be:
- Rename the Age column to be Age_Deprecated. This way, we don’t have to rewrite the table’s clustered index to remove the old Age column and its data, which would be disk-intensive.
- Add a new non-persisted computed column named Age. Because it’s non-persisted, this is also a super-quick, not-size-of-data operation.
At this point, any apps that still read the Age column can keep right on truckin’ with no changes. Any apps that try to write the Age column should simply have their proposed changes discarded by the trigger – the insert/update should still work, but just not affect the Age column’s contents directly.
Yes, constructive changes are more work for you.
The YOLO development pattern would be easier: just type BEGIN TRAN, change your tables, and send a Slack message to say, “Yo, sure hope the app handles that change I just made.” In the beginning of your career, working with small apps where you’re the only developer, the YOLO pattern is just fine. As your career and applications grow, though, you find the need for constructive changes only.
I’m only scratching the surface of the work involved. Things get more complex when you’re changing columns, like say breaking a UserName column into separate FirstName and LastName components. (Which, frankly, is a bad idea anyway.) To learn more about constructive changes: