This week, we’re all about foreign keys. Yesterday’s post covered scripts to set up the Stack Overflow database to get ready, eliminating data that would violate FK relationships.
You Had Two Jobs!
Let’s say I wanted to implement Foreign Keys to do two things
- If a user deletes their account, all of their badges, comments, and posts will also get deleted
- If a user deletes their post, all of the comments and votes will also get deleted
Remember that the data dump is cleaned so user voting isn’t recorded, so we’ll steer clear of that. I also have no idea how to simply implement the removal of badges someone may have gotten from an associated post, so I’m leaving that out. The badge stays in the picture!
For the post and vote deletion — in real life, this would also be really complicated. We’d have to adjust user reputation, and info we maintain about how many up and down votes the user has cast. We’d also have to restore reputation to users if an answer they downvoted was deleted. This is well beyond what foreign keys can do well, or easily. That’s implementing business logic, which would be a better fit for a trigger, if we’re sticking to doing it in the database.
What can foreign keys do for us, then?
Foreign Keys offer you the option to have updates and deletes cascade from the parent (reference) table to the child table. We’ll look at how exactly what works in the next post, but for now let’s just try to get the foreign keys set up.
Issues, Tissues
In order to make testing this easier, I added some code to our stored procedure that drops indexes.
Now it can also drop foreign key, unique, and check constraints.
You can thank me later.
First, let’s get badges deleted if a user self-immolates.
1 2 3 |
ALTER TABLE dbo.Badges ADD CONSTRAINT fk_badges_users_id FOREIGN KEY (UserId) REFERENCES dbo.Users(Id) ON DELETE CASCADE; |
Everything looks good so far! Let’s get comments deleted, too.
1 2 3 4 5 6 7 8 |
ALTER TABLE dbo.Comments ADD CONSTRAINT fk_comments_users_id FOREIGN KEY (UserId) REFERENCES dbo.Users(Id) ON DELETE CASCADE; ALTER TABLE dbo.Comments ADD CONSTRAINT fk_comments_posts_id FOREIGN KEY (PostId) REFERENCES dbo.Posts(Id) ON DELETE CASCADE; |
Also no problem. Okay, now let’s get votes deleted if a post goes away.
1 2 3 |
ALTER TABLE dbo.Posts ADD CONSTRAINT fk_posts_users_id FOREIGN KEY (OwnerUserId) REFERENCES dbo.Users (Id) ON DELETE CASCADE; |
Alright, now let’s get posts deleted if a user cashes out.
1 2 3 |
ALTER TABLE dbo.Posts ADD CONSTRAINT fk_posts_users_id FOREIGN KEY (OwnerUserId) REFERENCES dbo.Users (Id) ON DELETE CASCADE; |
This goes less well.
1 2 3 4 5 |
Msg 1785, Level 16, State 0, Line 190 Introducing FOREIGN KEY constraint 'fk_posts_users_id' on table 'Posts' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. Msg 1750, Level 16, State 1, Line 190 Could not create constraint or index. See previous errors. |
Orderly
If you’ve ever hit this error, you’ve no doubt landed on The Least Helpful Page On The Internet®
If you haven’t, I’ll summarize it for you: DON’T DO WHAT YOU’RE TRYING TO DO
Now, when I was testing this, I found that I’d hit that error in different places if I created things in a different order.
As a for-instance, if I start from the bottom up, I hit it trying to add a constraint from comments to posts.
1 2 3 4 5 |
Msg 1785, Level 16, State 0, Line 143 Introducing FOREIGN KEY constraint 'fk_comments_posts_id' on table 'Comments' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. Msg 1750, Level 16, State 1, Line 143 Could not create constraint or index. See previous errors. |
It seems that I don’t have a straightforward way of getting both things to happen. I could try chaining things differently…
1 2 3 |
ALTER TABLE dbo.Comments ADD CONSTRAINT fk_comments_posts_id FOREIGN KEY (PostId) REFERENCES dbo.Votes(PostId) ON DELETE CASCADE; |
But the PostId column in Votes isn’t unique — the only truly unique columns in Stack tables are the Id columns. I’ll get this error.
1 2 3 4 5 |
Msg 1776, Level 16, State 0, Line 155 There are no primary or candidate keys in the referenced table 'dbo.Votes' that match the referencing column list in the foreign key 'fk_comments_posts_id'. Msg 1750, Level 16, State 1, Line 155 Could not create constraint or index. See previous errors. |
I can’t create a unique index or constraint just on PostId anywhere, and even the combination of PostId and other columns can be non-unique.
I can make a unique index on UserId and Id, though.
1 |
CREATE UNIQUE NONCLUSTERED INDEX ix_ennui ON dbo.Votes (PostId, Id) |
But now I’m dealing with a multi-key foreign key, which’ll have some drawbacks that we’ll explore in a later post.
1 2 3 |
ALTER TABLE dbo.Comments ADD CONSTRAINT fk_comments_posts_id FOREIGN KEY (PostId, Id) REFERENCES dbo.Votes(PostId, Id) ON DELETE CASCADE; |
More locally, we’ll run into this error:
1 2 3 4 5 |
Msg 1788, Level 16, State 0, Line 180 Cascading foreign key 'fk_comments_posts_id' cannot be created where the referencing column 'Comments.Id' is an identity column. Msg 1750, Level 16, State 1, Line 180 Could not create constraint or index. See previous errors. |
No foreign keys allowed on identity columns.
You Ain’t Got No Alibi
By now, we’re starting to ask ourselves if these cascading actions are worth it. If we don’t implement them, we can have referential integrity out the wazoo. And those trigger thingamajigs are starting to look mighty attractive. They don’t have all these “rules” to deal with, and we can offload the deletions there. Heck, if we’re using a stored procedure, we don’t even need triggers. We can just fire off deletes to any tables we care about from there.
Foreign Keys with cascading actions: Strike One. In the next post, we’ll learn why I’d give them a lifetime ban from baseball. Thanks for reading!
4 Comments. Leave new
There seems to be a copy/paste issue resulting in fk_posts_users_id being created twice, once for “Okay, now let’s get votes deleted if a post goes away” and once for “Alright, now let’s get posts deleted if a user cashes out.” It’s the wrong code for that first one.
Yes, think so too. Should be dbo.votes
Looking forward to the next part in this series as I suspect it’s about performance. We run into _that_ fun all the time when trying to clear out the history in the SSISDB. MS decided to implement cascading deletes so deleting 1 row can affect millions of rows behind the scenes. :/
I haven’t tried to nest anything this deep. Whenever our devs try to reach for the convenience of a cascading delete, I encourage calling a stored procedure instead. We can control the order and size and don’t run into crazy issues like the one above.
Peter,
Yes, Microsoft set 2012 up for failure with cascading deletes, lack of batching, and missing indexes:
http://www.nikoport.com/2014/01/26/fixing-ssisdb-performance-troubles-kb-2829948/
http://www.ssistalk.com/2013/01/31/ssis-2012-catalog-indexing-recommendations/
It also defaults to keeping an absurd amount of data – 365 days worth (https://www.mssqltips.com/sqlservertip/3307/managing-the-size-of-the-sql-server-ssis-catalog-database/). Before the index fix came out I ran a process to change the retention setting every night for weeks to eventually get it down to just 14 days. Because every time I dropped the retention, the maintenance job would run for hours to delete the now out of date rows.
But they did eventually fix it with a CU (both SQL Server 2012 CU7 and SQL Server 2012 SP1 CU4) that at least contained the proper indexes. I don’t know that they ever batched the deletes as I haven’t had to deal with SSIS for several years.