How to Make Database Changes Without Breaking Everything


You’ve got an existing application with a database back end. You’re thinking about changing the database, and you don’t wanna break stuff.

The most important thing to understand is difference between constructive and destructive changes, also known as additive and destructive changes, or non-breaking and breaking changes.

Constructive change examples: adding a new table or view, adding a new nullable column to an existing table, adding a new optional parameter to an existing stored procedure or function. These changes should never be made in a way that breaks anything that queries the database.

Destructive change examples: dropping a table or view, removing or modifying an existing column, changing an existing parameter. These often break apps that query the database.

Constructive changes give your code new options.
Destructive changes give your code new requirements.

Let’s start with the Stack Overflow Users table.

The public data dump version of the table has a column for Age, and let’s pretend that’s the current state of our application. Our application requirements have changed, and now we want to:

  • Store Birthdate instead of Age so users don’t have to log in and change it all the time
  • Calculate Age on the fly in the front end
  • Default everyone to a Jan 1 birthdate that would match to their current Age
  • Force them to set their correct birthdate on the next login

destructive set of database changes would be to:

  1. Add a new non-nullable Birthdate column, with the YYYY/01/01 birthdate that works for their current Age.
  2. Drop the Age column.
  3. Require the Birthdate column to be entered whenever users edit their account, because our architect has mandated that the column is mandatory.
  4. Change the front end user-profile-edit screen to have a Birthdate field instead of Age.

All of that would have to be coordinated to happen at the same time, which means we would probably need an outage. That kind of choreographed dance is painful: sometimes it’s just not easy to change the database AND the app at exactly the same time.

Instead, a constructive set of changes might be:

1. Add a new nullable Birthdate column, and add a trigger to populate it. When someone inserts or updates a User row:

  • If a newly changed row’s Birthdate is set to null, set it to a YYYY/01/01 birthdate that works for their chosen Age.
  • If the Birthdate is not set to null – meaning, the app has been modified to pass in a valid Birthdate – then set the Age column’s value based on the Birthdate. Discard whatever Age was passed in – we’re going to be calculating it live down the road anyway.

The advantage of a constructive change like this is that from this moment forward, the application code can be changed at any time to either update Birthdate or Age. Just make sure you write your trigger to handle multiple rows.

2. Change the application to populate the Birthdate column instead of Age. Note that we could do either step 2 or 3 first, or even simultaneously. There are no dependencies between these two.

3. Backfill existing user Birthdate values. In step 1, we allowed the column to be nullable because that way, adding the column was a super-quick operation that didn’t require writing a ton of data to the clustered index. SQL Server can add nullable columns with only a brief schema lock. Now, let’s go back and run an UPDATE statement to populate null Birthdates with the YYYY/01/01 value. If the table was too large and workload too continuous to allow a blocking operation, we can nibble through the rows 1,000 at a time to avoid lock escalation.

4. Change the Age column to be computed based on their Birthdate. After #3 completes, all of the Users have a Birthdate. It might not be their actual birthdate – over time, hopefully folks will log in and change their birthdate to be accurate. (But they might not.) From this point forward, we need to start automatically calculating their Age. There are a few ways we could do this, but my favorite would probably be:

  • Rename the Age column to be Age_Deprecated. This way, we don’t have to rewrite the table’s clustered index to remove the old Age column and its data, which would be disk-intensive.
  • Add a new non-persisted computed column named Age. Because it’s non-persisted, this is also a super-quick, not-size-of-data operation.

At this point, any apps that still read the Age column can keep right on truckin’ with no changes. Any apps that try to write the Age column should simply have their proposed changes discarded by the trigger – the insert/update should still work, but just not affect the Age column’s contents directly.

Yes, constructive changes are more work for you.

The YOLO development pattern would be easier: just type BEGIN TRAN, change your tables, and send a Slack message to say, “Yo, sure hope the app handles that change I just made.” In the beginning of your career, working with small apps where you’re the only developer, the YOLO pattern is just fine. As your career and applications grow, though, you find the need for constructive changes only.

I’m only scratching the surface of the work involved. Things get more complex when you’re changing columns, like say breaking a UserName column into separate FirstName and LastName components. (Which, frankly, is a bad idea anyway.) To learn more about constructive changes:

Previous Post
It’s Friday. I’m Not Answering These 14 Office Hours Questions.
Next Post
[Video] Office Hours: As They Pour My Patio Concrete Edition

8 Comments. Leave new

  • Using ADO you cannot make Age computed column *and* allow updates through ADODB.Recordsets as it fetches base table metadata and knows which columns are readonly at client-side.

  • A lot of the principles on not breaking code i got from API development where you sometimes dont know who’s app you are going to break if you are not careful. That included only adding optional parameters and never removing any.

    But good design of your application (including database schema) should not have the need for change in the near feature build right into them. A column like age is gonna give you trouble later on. Products have a price that is current for a certain period. People in an organization will change roles. Cars will change ownership. Haircolour goes grey eventually. If things are changing over time your application should be able to handle these changes appropiately. If the changes dont matter recording the original data probably doesnt matter as well.

    • Exactly, same here! I cut my teeth on API development as well.

    • A thing that bugs me a lot about SQL development is how poorly adhered to basic standards are and, in the database, it is arguably the most important place to adhere to standards given how difficult it is to change anything at that level.

      Its not that development in other languages is perfect, but any decent developer in something else will recognize crap design or code as crap. In SQL Development it seems that only the most elite adhere to standards while many decent SQL devs will justify things like adjacent aliasing as ‘just the way I learned and like,’ or defend poor normalization as ‘we only need 3nf, dont need to get out of hand’ when implementing something that is 0nf, justifying inconsistencies in style because it provides something only in the areas of inconsistency at the current point in time and in the future may break something.

  • James Russell
    July 30, 2023 8:51 pm

    I’d say adding a new column to a table, nullable or nonnullable is a breaking change as you can’t control how people select * from a table in their query, be it automated or not.

  • Jonas Gunnarsson
    August 1, 2023 3:13 pm

    The usage of extended properties can also be useful, when changing schema for a database. I use ‘Deprecated_IsError’ and ‘Deprecated_Message’, to inform, others of deprecated objects.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.