It’s one of those fierce religious wars that divides families and tears loved ones apart.
First, if two tables are related, should we tell SQL Server about it? If SQL Server knows that there’s a relationship, then it can make better decisions when building execution plans. The problem is that SQL Server has to be able to actually trust that relationship. Often, folks will disable keys and constraints temporarily in order to make a big operation go faster, like a bulk load, and then enable them again afterwards. However, if you don’t tell SQL Server to check the data afterwards, it simply doesn’t trust the constraints. I see it all the time in sp_Blitz warnings of “Foreign Keys or Check Constraints Not Trusted.” In those cases, the keys & constraints aren’t really helping your query plans.
Next, should we have SQL Server re-check our data? Once we’ve realized that our keys & constraints aren’t trusted, what do we do about it? We’ll go re-check the keys & constraints, which is an intensive operation that other users on the box will notice. Try it on a restored copy of production first, like in your development environment, because you’re likely going to discover that some of your data doesn’t match your defined relationships. We’ve got orphans, and they’re not as adorable as Annie.
Then what do we do about the bad data? You’ll need to clean that up, and it’s likely going to involve business discussions about data validity. How’d the junk get in there in the first place? How do we make sure it doesn’t happen again?
Finally, do your fixes in production. Script out the changes first in dev to delete the bad data and re-check the existing keys & constraints, then do it live. The changes are going to be logged operations, so the bigger they are, the more you have to look out for log shipping subscribers, replication subscribers, database mirrors, AG replicas, etc.
Is it all worth it? I’ve never seen a performance tuning case where I’ve said, “The one thing that will take you across the finish line is to have trusted foreign keys and constraints.” Usually, the simpler/easier/faster fix is to tune the queries and/or indexes. Plus, it’s an easier battle to fight with developers – who often don’t want to have any perceived overhead of enforcing foreign key relationships inside the database.
Even worse, sometimes the keys are the problem. If sp_Blitz reports serializable locking, your application may be doing updates and deletes, and relying on cascading updates & deletes to clean up child records. (You can learn more about this in Klaus Aschenbrenner’s key range locks demo.)
This is why I don’t pick a side on this particular religious war – neither side is right all the time, hahaha.
To learn more about the gotchas, read Erik’s 5-part adventure in setting up foreign keys in the Stack Overflow database, and why they didn’t really pay off:
Yep, I talk about that in the first paragraph. Make sure to read that one. 😉
Is it possible to happen that I can’t re-enable check constraint?
Maty – yep, like if you’ve got bad data in there that violates your constraints.
How can I find this data?
Heres an example Maty
CREATE TABLE [dbo].[Address]
[AddressId] INT NOT NULL
,[Street] NVARCHAR(255) NOT NULL
,[City] NVARCHAR(255) NOT NULL
,[ZipCode] NVARCHAR(25) NOT NULL
,CONSTRAINT [PK_Address] PRIMARY KEY ([AddressId])
CREATE TABLE [dbo].[Customer]
[CustomerId] INT NOT NULL
,[Name] NVARCHAR(100) NOT NULL
,CONSTRAINT [PK_Customer] PRIMARY KEY ([CustomerId])
,CONSTRAINT [FK_Customer_AddressId] FOREIGN KEY([AddressId]) REFERENCES [dbo].[Address]([AddressId])
ALTER TABLE [dbo].[Customer] NOCHECK CONSTRAINT [FK_Customer_AddressId];
Bulk fill the tables….
–Find customers with missing addresses
SELECT [CustomerId],[Name],[AddressId] FROM [dbo].[Customer] WHERE [AddressId] NOT IN (SELECT [AddressId] FROM [dbo].[Address])
Here’s another code snippet that scans all your DBs for non-trusted constraints and PRINTs out a checking statement for each:
I can’t help but feel that if you got crap data in your tables once, it’ll happen again.
Enabling constraints will stop this happening, so it’s gotta be a good thing to have them.
As for checking the existing data, I agree that performance gains seem theoretical rather than real world, but it’s gotta be worth at least using the constraints to check the validity of the existing data.
If the quality of the data isn’t important then you’re probably better off saving money on the SQL Server licence fee and using something that instead
If your not going to have foreign keys in your database design and enforce them, then do you have any need for primary keys? Just have unique, not null constraints with a clustered or non clustered index.
Some folks do indeed do that – they just roll with unique clustered indexes.
In my environment I’m pretty sure there is a significant benefit from the join elimination enabled by trusted foreign key constraints. But that’s a data warehouse.
I guess if we had a system where the work pattern was more predictable you could more easily design the views/queries to avoid unnecessary joins rather than designing a generalised set of views.
‘Is it all worth it? I’ve never seen a performance tuning case where I’ve said, “The one thing that will take you across the finish line is to have trusted foreign keys and constraints.”’
Yeah but I’ve lost count of the number of times customers say: “I can no longer rely on your piece of s…t application because the data in it is wrong. Screen 1 says I have 10 widgets, screen 2 says I have 11 widgets”
(because one does a join and one does not).
Doesn’t matter how fast your database queries are if they return junk.
Connor – got some bad news here: foreign keys don’t always fix that. Sometimes it’s just bad data modeling or bad querying – for example, if the join you’re discussing points to an optional table that shows where the widgets are stored in inventory, or widgets by parent part. If there’s no rows in these optional tables, and some queries join but some don’t, the data still smells bad.
But good news – we both agree that your application is a piece of s..t. It’s just more widespread than you’d feared.
I KIIID, I KIIID
“got some bad news here: foreign keys don’t always fix that”
True…but they’re going to save you from the “wtf” moment when you laboriously go through all of your queries and make sure they are indeed valid…and then still find you’re returning crap to the customer 🙂
I’ve worked with an vendor custom application.. or maybe crap-lication is a better word that has no foreign or constraints keys enforced. The code tends to break if the database rejects any database operations. (Yes the developers are working on this, but the vendor also didn’t understand the principles of object oriented development either…)
I’ve had to build a SQL application that identifies foreign key, constraint, and business rule violations in the data because it’s a primary tool for about 500 of my coworkers. I spend 30 minutes to an hour every day reviewing the results of it’s testing. And fixing the issues that occasionally arise.
I’ve done this for 6 years. Life without good data architecture can be hell. You don’t necessarily need every foreign key applied and enforced, but life is much easier when you do.
I prefer to enforce the foreign keys in the database 99.9% of the time. There are some exceptions but IMO very few. Mostly, in my experience, if we do not the application developers ALWAYS manage to mess up the application and insert invalid data causing me grief and ultimately the question from the powers to be – why did you let this happen in the first place? AARRGGHHH!
Exception, my bad… reporting database replicated to from production. We do not replicate the foreign keys, since the source of this data is already considered trusted!
This yells for classy Daily WTF article:
Is it advisable to have foreign key between 2 tables in across schemas ? If not can i use check constraint ?
Sure, you can do that. It doesn’t make a difference which schema they’re in, as long as they’re in the same database.