The only thing you ever need to use for database identity is an
IDENTITY, right? Well, maybe. There are a lot of different options and they all have different pros and cons.
The default way to identify objects in SQL Server is to use an
BIGINT column marked as an
IDENTITY. This guarantees relatively sequential numbers, barring restarts and failed inserts. Using identity columns put the responsibility for creating and maintaining object identity in the database.
SQL Server will cache
IDENTITY values and generate a new batch of identity values whenever it runs out. Because identity values are cached in memory, using identity values can lead to jumps in the sequence after SQL Server is restarted. Since identities are cached in memory in large batches, they make it possible to rapidly insert data – as long as disks are fast enough.
Sometimes the application needs more control over identity. SQL Server 2012 added sequences. A sequence, unlike an identity value, is a separate object in the database. Both application and database code can read from the sequence – multiple tables can share a sequence for an identity column or separate sequences can be created for each table.
Developers using a sequence can use the
CACHE value to cache a specific number of sequence values in memory. Or, if the application should have minimal gaps in the sequence, the
NOCACHE clause should be used.
The Problem with Sequential Identities
SEQUENCE values keep identity generation squarely in the database and, by using integral values, they keep the value narrow.
You can run into problems with sequential inserts on very busy systems – this can lead to latch contention on the trailing pages of the clustered index. This issue can be resolve by spreading inserts across the table by using a
GUID or some other semi-random clustering key. Admittedly, most systems are never going to run into this problem.
GUIDs for Object Identity
Some developers use GUIDs as a way of managing object identity. Although database administrators balk at this, there are good reasons to use GUIDs for object identity.
GUIDs let the application generate object identity. By moving object identity out to the application layer, users can do work in memory and avoid multiple round trips to the database until they’re ready to save the entire set of data. This technique gives tremendous flexibility to application developers and users.
There’s one other thing that a well designed application gets from this technique – independence from the database. An application that generates its own identity values doesn’t need the database to be online 24/7; as long as some other system is available to accept writes in lie of the database, the application still function.
Using GUIDs for object identity does have some issues. For starters, GUIDs are much wider than other integral data types – 16 bytes vs 4 bytes (
INT) or 8 bytes (
BIGINT). This is a non-issue for a single row or even for a small database, but at significant scale this can add a lot of data to the database. The other issue is that many techniques for generating sequential GUIDs in the application (see NHibernate’s GuidCombGenerator) can still run into GUID collisions.
What if you could get the best of both worlds? Applications generating unique identities that are also sequential?
The point of identity generation is to abstract away some portion identity from data attributes and provide an independent surrogate value. GUIDs can provide this, but they aren’t the perfect solution. Identity generators like flake or rustflakes promise roughly sequential identity values that are generated in the application layer and are unique across multiple processes or servers.
The problem with an external identity generator is that it is an extra piece of code that developers need to manage. External dependencies carry some risk, but these are relatively safe items that require very little effort implement and maintain.
There’s no right solution, there’s only a solution that works for you. You may even use each solution at different points in the lifecycle of the same product. It’s important, though, for developers and DBAs to be aware of how identity is currently being handled, the issues that can arise from the current solution, and ideas of how to handle it going forward.