Keep it Constrained

Last Updated 7 years ago

SQL Server has this fancy feature called constraints. Database constraints are used to restrict the domain (the range of allowed values) of a given attribute. That’s just a funny way of saying: through a set of carefully crafted rules, we control the shape of our universe.

Our Test Table

We’re going to be testing with the following tables:

[code lang=”sql”] CREATE TABLE sandwiches
(
id INT IDENTITY(1,1),
title VARCHAR(60),
price MONEY
);
[/code]

Unique Constraints

A unique constraint guarantees that one and only one row in the table can have a specific value. If our application requires there is only one sandwich called “The Super Big Elvis”, we need some way to make sure that the database can’t contain two “The Super Big Elvis” sandwiches.

A naive approach would involve application code checking the database for the existence of data. This approach has several problems:

Every application that connects to the database must either duplicate this logic or else use a centralized service to check.
There’s no guarantee that another application won’t check for data and save faster.

Instead of requiring our applications to handle this integrity, we can push the uniqueness requirement down to the database using a unique constraint:

[code lang=”sql”] ALTER TABLE sandwiches
ADD CONSTRAINT UQ_sandwiches
UNIQUE ( title );
[/code]

There’s one important gotcha with SQL Server: only one NULL value is allowed per column (or set of columns) in the unique constraint. If you had a table with two columns, a and b, there are only three possibilities for rows that involve NULL:

a: NULL, b: anything
a: anything, b: NULL
a: NULL, b: NULL

If you need a uniqueness constraint to ignore NULL values, then you’ll need to create a unique filtered index like this:

[code lang=”sql”] CREATE UNIQUE INDEX UX_customers_full_name
ON customers(first_name, last_name)
WHERE first_name IS NOT NULL
AND last_name IS NOT NULL;
[/code]

Now we can at least make sure that the uniqueness applies to people with both a first name and last name.

The Primary Key Constraint

A primary key is a logical construct – it’s a set of attributes that cannot be duplicated by any other row in the table. The primary key doesn’t have to be a single column, it can be a composite key – as long as the columns uniquely identify a single row it’s a valid primary key.

Don’t confuse a primary key with a clustered index, though. In SQL Server, the default behavior is to create a primary key as a clustered index – the clustered index defines the physical order of data in the table. This has led to a slew of iffy advice like “never use natural primary keys” or “don’t use GUIDs as primary keys”.

When thinking about primary keys as constraints, concern yourself first with identifying the unique characteristics of the rows for your application. Follow that up with physical optimization decisions once you’re ready to implement the physical model.

[code lang=”sql”] ALTER TABLE sandwiches
ADD CONSTRAINT pk_sandwiches
PRIMARY KEY (id);
[/code]

Aww yeah, we’ve got a primary key. The default behavior for SQL Server is to create a clustered index under the PK if one doesn’t exist already. If you need to use a different index as the clustered index (e.g. the PK is a GUID and the clustered index is an integer) just make sure that you create the clustered index before you create the PK or that you specific the primary key as a non-clustered index:

[code lang=”sql”] ALTER TABLE sandwiches
DROP CONSTRAINT pk_sandwiches;

ALTER TABLE sandwiches
ADD unique_id UNIQUEIDENTIFIER DEFAULT(NEWID())

CREATE UNIQUE CLUSTERED INDEX CX_sandwiches ON sandwiches(unique_id);

ALTER TABLE sandwiches
ADD CONSTRAINT pk_sandwiches
PRIMARY KEY NONCLUSTERED (id) ;
[/code]

What do PKs buy us? A primary key gets you the same thing as a unique constraint with the added benefit of saying, “This is the primary identifier for this row. Other attribute combinations may be unique, but this combination of attributes identifies this entity in the database.”

Check Constraints

Check constraints give you more flexibility than the other forms of constraints. Both unique constraints and primary key constraints operate on a list of columns that we supply – they are limited to simple combinations of columns. Check constraints, however, give you something more.

On the surface, as check constraint is just as simple as any other constraint – the statement in the check constraint has to evaluate to true. Seems simple, right?

The upside of this approach is that “any statement” means that any valid T-SQL statement can be part of the check constraint… so long as the T-SQL statement doesn’t rely on access to another table. So, we can do something like this:

[code lang=”sql”] CREATE FUNCTION fn_check_sandwich_name
(
@sandwich_name VARCHAR(60)
)
RETURNS BIT
AS
BEGIN
IF LTRIM(RTRIM(@sandwich_name)) = ‘Big Mac’
RETURN 0;
RETURN 1;
END;
[/code]

We want to avoid lawsuits with a certain fast food chain, so this function should keep us safe. In order to use the function a check constraint we can just do the following:

[code lang=”sql”] ALTER TABLE sandwiches
WITH CHECK
ADD CONSTRAINT CK_sandwich_name
CHECK (dbo.fn_check_sandwich_name(title) = 1);
[/code]

Let’s try it out:

[code lang=”sql”] INSERT INTO sandwiches (title, price) VALUES (‘The Big Ozar’, 12.88);
INSERT INTO sandwiches (title, price) VALUES (‘Big Mac’, 2.99);
[/code]

The first row will successfully inserted into the sandwiches table. The second insert attempt will fail – the title matches the failure condition in the function fn_check_sandwich_name.

There’s at least one scenario that this doesn’t catch: NULLs. With code in its current state, we can insert NULL in the title column of sandwiches:

[code lang=”sql”] INSERT INTO sandwiches (title, price) VALUES (NULL, 9999.99);
[/code]

Functions used as check constraints should explicitly check for NULL inputs; without taking NULL into account, it’s possible to create a check constraint that will allow incorrect data to end up in a table. Although these data rules should be pushed down to the underlying schema where it’s easy to declare a column as NOT NULL, any check constraints that operate on incoming data need to make sure they account for NULL.

About That World…

Using a combination of table constraints, we can build a complete understanding of the totality of the data in the database without having to execute a single query. Constraints let us push universal rules about the total state of data down into the database where all applications can take advantage of them and where those rules only need to be maintained in one place.

Interview with Me on SQL Server Radio

Always On Availability Groups, Backup Checksums, and Corruption

8 Comments. Leave new

nordic
December 10, 2013 9:52 am

What colors do you use for SSMS? As in the syntax highlighter? It is possible to send export settings? 🙂

Reply
- Brent Ozar
  December 10, 2013 2:19 pm
  
  Nordic – the syntax highlighter is part of WordPress, not SSMS. WordPress is our blogging platform.
  
  Reply
  - nordic
    December 10, 2013 2:38 pm
    
    I understand. But these colors would be fantastic for SSMS. 🙂
    
    Reply
    - Nicholas Peterson
      December 11, 2013 9:34 am
      
      A quick google for ‘Solarized’ and SSMS turned up this blog where the author made a registry tweak to get the color scheme in ssms closer to what you want.
      
      http://www.jimmcleod.net/blog/index.php/2012/05/03/beautify-your-management-studio-20052008/
      
      Reply
Greg
December 18, 2013 12:56 pm

Could you please elaborate a little about the restriction on multiple NULL values in a UNIQUE constraint? I can reproduce the behavior you describe when the UNIQUE constraint involves a single column

create table dbo._test(
pkey int NOT NULL,
a int NULL,
primary key clustered (pkey),
constraint UQ__test unique nonclustered (a)
)
insert _test values (1,NULL)
insert _test values (2,NULL)

produces the following error:

Msg 2627, Level 14, State 1, Line 8
Violation of UNIQUE KEY constraint ‘UQ__test’. Cannot insert duplicate key in object ‘dbo._test’. The duplicate key value is ().

However, when the UNIQUE constraint involves two columns I find that multiple NULL values are allowed:

create table dbo._test(
pkey int NOT NULL,
a int NULL,
b int NULL,
primary key clustered (pkey),
constraint UQ__test unique nonclustered (a,b)
)
insert _test values (1,1,NULL)
insert _test values (2,2,NULL)
insert _test values (3,NULL,1)
insert _test values (4,NULL,2)
select * from _test order by 1,2,3

Produces the following result:

pkey a b
———– ———– ———–
1 1 NULL
2 2 NULL
3 NULL 1
4 NULL 2

Reply
- Jeremiah Peschka
  December 18, 2013 12:59 pm
  
  Sure thing! In your first example, your rows aren’t unique – SQL Server views both NULLs as equal. In your second example, all of your rows are unique – the uniqueness constraint is across all columns in the constraint.
  
  If you take your example one step further and attempt to run insert _test values (5,1,NULL), you should find that SQL Server produces the same error.
  
  Reply
  - Greg
    December 18, 2013 1:10 pm
    
    Thank you , I get it now. I normally never allow NULLs in a column that’s part of a UNIQUE constraint and the last time I did I was using an ANSI compliant RDBMS so SQL Server’s behavior is a bit strange for me.
    
    Reply
    - Jeremiah Peschka
      December 18, 2013 1:14 pm
      
      Yeah, the non-default behavior of SQL Server is… vexing. I don’t find myself simulating ANSI compliant behavior very often, but I’ve never taken the time to find out if people are designing systems around the limitations of SQL Server or if they’re already aware of this issue.
      
      Reply